- shared aiohttp ClientSession in HAClient (Phase 1 Flag #2 fixed): make_session() factory, session injected at startup, closed on shutdown - Check.run() → list[CheckResult]: clean multi-event interface - first real diagnostic check: entity unavailable > 24h (INSERT OR IGNORE baseline preserves first-seen timestamp) - root cause grouping: emit ha_integration_failed instead of N entity events when ≥50% of integration's entities are unavailable (≥3 min) - alert deduplication via SQLite cooldown window (default 6h) - recovery clears baseline + dedup for immediate re-alert - configurable thresholds: duration, integration %, cooldown - 38 unit tests + 7 integration tests (42 pass, 3 skip w/o live HA) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| backups/zigbee | ||
| docs | ||
| dotfiles | ||
| hosts | ||
| inventory | ||
| scripts | ||
| services | ||
| .codex | ||
| .gitignore | ||
| CLAUDE.md | ||
| codex_context | ||
| codex_context.yaml | ||
| deploy_agent.py | ||
| ollama_client.py | ||
| README.md | ||
| start-aider.sh | ||
| start-codex.sh | ||
| sync-context.sh | ||
| tech-debt.md | ||
| update-context.md | ||
Homelab Codex
GitOps-lite orchestration for a distributed homelab environment.
Architecture
The homelab consists of several nodes connected via a Tailscale internal mesh.
| Host | Role | Description |
|---|---|---|
| SATURN | Primary Node | Development, orchestration, and git source of truth (commit node). |
| SOLARIA | Compute Node | GPU, inference, and heavy compute workloads. |
| PIHA | Infra Node | Core infrastructure services, automation, and monitoring. |
| VPS | Edge Node | Public ingress, reverse proxy, and edge services. |
Agent System
The homelab uses a multi-agent orchestration model with human-in-the-loop for destructive actions:
| Agent | Node | Role |
|---|---|---|
| stability-agent | all nodes | Per-node watchdog — monitors Docker, disk, Tailscale, MQTT; emits events |
| node-agent | all nodes | Publishes container health events to Redis pub/sub |
| observer | VPS | Synthesizes world state from events into /opt/homelab/world/*.json |
| supervisor | VPS | Detects drift between desired and actual state; writes pending actions |
| planner-agent | SOLARIA | LLM-powered diagnosis — listens to Redis, proposes remediation actions |
| executor | VPS | Executes actions only after operator approval |
| operator-ui + telegram-bot | VPS / PIHA | Operator reviews and approves/rejects pending actions |
Action approval flow: pending/ → operator approves → approved/ → executor runs.
Repository Structure
docs/: Infrastructure Standards and Deployment Conventions.hosts/: Host-specific configurations and service assignments.services/: Reusable Docker Compose service definitions.scripts/: Deployment and management scripts.
Getting Started
- Standardization: Follow the Infrastructure Standards.
- Deployment: See Deployment Conventions for how to roll out changes.
- SATURN: Remember that SATURN is the only node where commits should be made.
Documentation Index
- Infrastructure Standards
- Agent Operating Procedures (For AI/Non-Human Agents)
- Deployment Conventions
- Hardware
- Networking
- Services
- Node Capabilities
- Action Model
Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.