homelab-codex-ws/services/planner-agent/service.yaml
Oskar Kapala ca37fca5ce feat(planner-agent): main loop with LLM routing and HITL action proposals
services/planner-agent/src/planner.py:
- PlannerAgent: async Redis pub/sub on health_events + world_updates
- Pipeline: receive event → cooldown gate → LLMRouter → write pending action
  → emit remediation_started filesystem event
- CooldownTracker: 5-min suppression per svc_key (configurable via env)
- parse_event(): accepts node-agent shape A and world_updates shape B
- PROPOSAL_SCHEMA: jsonschema enforced by LLMRouter before accepting response
- SYSTEM_PROMPT: homelab topology + action rules (chelsty always requires_human,
  disk_pressure always notify, confidence<0.7 → requires_human)
- write_pending_action(): atomic tmp→rename write, executor-compatible format
- emit_event(): async wrapper around filesystem event write (no control-plane import)
- _emit_event_sync() reads NODE_NAME at call time (not import) for testability
- Benign events (service_healthy, node_online, ...) silently skipped
- LLM chain failure: no cooldown recorded so next event can retry

services/planner-agent/tests/test_planner.py (49 tests, 0 network):
- TestCooldownTracker: 7 tests (ready/not-ready/elapsed/reset/independence)
- TestHealthEvent, TestActionProposal, TestMapActionToExecutorType
- TestParseEvent: both event shapes, missing fields, timestamp formats
- TestBuildMessages: system prompt rules, payload inclusion
- TestPlannerHandleEvent: benign skip, cooldown block, ignore/restart/redeploy/
  notify proposals, remediation event emission, LLM failure isolation,
  requires_human propagation, cooldown recording, model name in proposal
- TestPlannerDispatch: valid JSON, invalid JSON, non-string data, missing node
- TestWritePendingAction, TestEmitEvent: filesystem integration with tmp_path

services/planner-agent/service.yaml:
  owner_node: solaria, dependencies: [redis, ollama]
services/planner-agent/docker-compose.yml: env + healthcheck
services/planner-agent/Dockerfile: python:3.11-slim
services/planner-agent/healthcheck.sh: heartbeat file age check (300s)
services/planner-agent/requirements.txt: litellm, redis, jsonschema, structlog

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 19:11:39 +02:00

46 lines
1.2 KiB
YAML

service:
name: planner-agent
owner_node: solaria
exposure: private
dependencies:
- redis
- ollama
ports: [] # no external port; communicates via Redis pub/sub
healthcheck:
type: file
path: /opt/homelab/state/planner-agent.heartbeat
max_age_seconds: 300 # 5 minutes — matches COOLDOWN_SECONDS
interval: 60s
timeout: 10s
retries: 3
restart_policy: unless-stopped
persistence:
paths:
- /opt/homelab/actions
- /opt/homelab/events
- /opt/homelab/state
runtime:
directories:
- /opt/homelab/actions/pending
- /opt/homelab/actions/approved
- /opt/homelab/actions/running
- /opt/homelab/actions/completed
- /opt/homelab/actions/failed
- /opt/homelab/actions/rejected
- /opt/homelab/actions/cancelled
- /opt/homelab/events
- /opt/homelab/state
env_vars:
- REDIS_URL # redis://100.108.208.3:6379
- OLLAMA_HOST # http://100.108.208.3:11434
- OLLAMA_MODEL # qwen2.5:7b
- ANTHROPIC_API_KEY # for claude-haiku/sonnet fallback
- NODE_NAME # solaria
- COOLDOWN_SECONDS # default 300
- RUNTIME_PATH # default /opt/homelab