homelab-codex-ws/services
Oskar Kapala ca37fca5ce feat(planner-agent): main loop with LLM routing and HITL action proposals
services/planner-agent/src/planner.py:
- PlannerAgent: async Redis pub/sub on health_events + world_updates
- Pipeline: receive event → cooldown gate → LLMRouter → write pending action
  → emit remediation_started filesystem event
- CooldownTracker: 5-min suppression per svc_key (configurable via env)
- parse_event(): accepts node-agent shape A and world_updates shape B
- PROPOSAL_SCHEMA: jsonschema enforced by LLMRouter before accepting response
- SYSTEM_PROMPT: homelab topology + action rules (chelsty always requires_human,
  disk_pressure always notify, confidence<0.7 → requires_human)
- write_pending_action(): atomic tmp→rename write, executor-compatible format
- emit_event(): async wrapper around filesystem event write (no control-plane import)
- _emit_event_sync() reads NODE_NAME at call time (not import) for testability
- Benign events (service_healthy, node_online, ...) silently skipped
- LLM chain failure: no cooldown recorded so next event can retry

services/planner-agent/tests/test_planner.py (49 tests, 0 network):
- TestCooldownTracker: 7 tests (ready/not-ready/elapsed/reset/independence)
- TestHealthEvent, TestActionProposal, TestMapActionToExecutorType
- TestParseEvent: both event shapes, missing fields, timestamp formats
- TestBuildMessages: system prompt rules, payload inclusion
- TestPlannerHandleEvent: benign skip, cooldown block, ignore/restart/redeploy/
  notify proposals, remediation event emission, LLM failure isolation,
  requires_human propagation, cooldown recording, model name in proposal
- TestPlannerDispatch: valid JSON, invalid JSON, non-string data, missing node
- TestWritePendingAction, TestEmitEvent: filesystem integration with tmp_path

services/planner-agent/service.yaml:
  owner_node: solaria, dependencies: [redis, ollama]
services/planner-agent/docker-compose.yml: env + healthcheck
services/planner-agent/Dockerfile: python:3.11-slim
services/planner-agent/healthcheck.sh: heartbeat file age check (300s)
services/planner-agent/requirements.txt: litellm, redis, jsonschema, structlog

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 19:11:39 +02:00
..
agent-system Fix Copy for AI: materializer fetches from control-plane API instead of Redis 2026-05-27 16:07:51 +02:00
control-plane supervisor: also cancel pending actions for services removed from desired state 2026-05-27 15:19:13 +02:00
forgejo Add node capability model 2026-05-11 20:46:50 +02:00
mosquitto Implement filesystem-first runtime event system 2026-05-12 13:38:25 +02:00
node-agent Fix ghost service keys from hash-prefixed Docker container names 2026-05-27 15:41:13 +02:00
node_exporter Fix pending actions: node_exporter, zigbee2mqtt, chelsty-ha monitoring 2026-05-27 15:10:48 +02:00
npm Add node capability model 2026-05-11 20:46:50 +02:00
ollama Add node capability model 2026-05-11 20:46:50 +02:00
planner-agent feat(planner-agent): main loop with LLM routing and HITL action proposals 2026-05-27 19:11:39 +02:00
stability-agent Fix stability agent fleet deploy scripts 2026-05-17 21:09:06 +02:00
zigbee2mqtt Adapt zigbee2mqtt for SLZB coordinator 2026-05-14 16:37:18 +02:00
.gitkeep Add infrastructure standards and deployment conventions 2026-05-07 21:16:03 +02:00