Find a file
Oskar Kapala ca37fca5ce feat(planner-agent): main loop with LLM routing and HITL action proposals
services/planner-agent/src/planner.py:
- PlannerAgent: async Redis pub/sub on health_events + world_updates
- Pipeline: receive event → cooldown gate → LLMRouter → write pending action
  → emit remediation_started filesystem event
- CooldownTracker: 5-min suppression per svc_key (configurable via env)
- parse_event(): accepts node-agent shape A and world_updates shape B
- PROPOSAL_SCHEMA: jsonschema enforced by LLMRouter before accepting response
- SYSTEM_PROMPT: homelab topology + action rules (chelsty always requires_human,
  disk_pressure always notify, confidence<0.7 → requires_human)
- write_pending_action(): atomic tmp→rename write, executor-compatible format
- emit_event(): async wrapper around filesystem event write (no control-plane import)
- _emit_event_sync() reads NODE_NAME at call time (not import) for testability
- Benign events (service_healthy, node_online, ...) silently skipped
- LLM chain failure: no cooldown recorded so next event can retry

services/planner-agent/tests/test_planner.py (49 tests, 0 network):
- TestCooldownTracker: 7 tests (ready/not-ready/elapsed/reset/independence)
- TestHealthEvent, TestActionProposal, TestMapActionToExecutorType
- TestParseEvent: both event shapes, missing fields, timestamp formats
- TestBuildMessages: system prompt rules, payload inclusion
- TestPlannerHandleEvent: benign skip, cooldown block, ignore/restart/redeploy/
  notify proposals, remediation event emission, LLM failure isolation,
  requires_human propagation, cooldown recording, model name in proposal
- TestPlannerDispatch: valid JSON, invalid JSON, non-string data, missing node
- TestWritePendingAction, TestEmitEvent: filesystem integration with tmp_path

services/planner-agent/service.yaml:
  owner_node: solaria, dependencies: [redis, ollama]
services/planner-agent/docker-compose.yml: env + healthcheck
services/planner-agent/Dockerfile: python:3.11-slim
services/planner-agent/healthcheck.sh: heartbeat file age check (300s)
services/planner-agent/requirements.txt: litellm, redis, jsonschema, structlog

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 19:11:39 +02:00
backups/zigbee Add Zigbee coordinator backup 2026-05-14 18:24:26 +02:00
docs docs: session summary 2026-05-27 + update observer/control-plane/chelsty docs 2026-05-27 16:18:31 +02:00
dotfiles add shared zshrc 2026-05-10 20:52:44 +02:00
hosts Fix Copy for AI: materializer fetches from control-plane API instead of Redis 2026-05-27 16:07:51 +02:00
inventory ops: align vps desired state with control-plane architecture, remove legacy agent-system references 2026-05-21 11:40:55 +02:00
scripts Fix ghost service keys from hash-prefixed Docker container names 2026-05-27 15:41:13 +02:00
services feat(planner-agent): main loop with LLM routing and HITL action proposals 2026-05-27 19:11:39 +02:00
.codex Document current homelab state 2026-04-15 17:37:25 +02:00
.gitignore Add infrastructure standards and deployment conventions 2026-05-07 21:16:03 +02:00
CLAUDE.md docs(CLAUDE.md): update node model and override path convention 2026-05-20 15:27:46 +02:00
codex_context Add session context state 2026-04-20 22:10:39 +02:00
codex_context.yaml add shared context lock 2026-05-05 17:25:50 +02:00
deploy_agent.py Add deploy escalation output 2026-04-22 22:08:26 +02:00
ollama_client.py Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
README.md docs: uzupelnij dokumentacje pod katem agentow AI 2026-05-20 12:06:23 +02:00
start-aider.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
start-codex.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
sync-context.sh add shared context lock 2026-05-05 17:25:50 +02:00
tech-debt.md docs: add tech-debt.md, forgejo_runner temp disabled 2026-05-21 10:37:42 +02:00
update-context.md Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00

Homelab Codex

GitOps-lite orchestration for a distributed homelab environment.

Architecture

The homelab consists of several nodes connected via a Tailscale internal mesh.

Host Role Description
SATURN Primary Node Development, orchestration, and git source of truth (commit node).
SOLARIA Compute Node GPU, inference, and heavy compute workloads.
PIHA Infra Node Core infrastructure services, automation, and monitoring.
VPS Edge Node Public ingress, reverse proxy, and edge services.

Repository Structure

Getting Started

  1. Standardization: Follow the Infrastructure Standards.
  2. Deployment: See Deployment Conventions for how to roll out changes.
  3. SATURN: Remember that SATURN is the only node where commits should be made.

Documentation Index


Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.