# Observer Runtime The Observer Runtime is a lightweight agent responsible for synthesizing the operational world state of the homelab from raw events, logs, and state files. ## Architecture The observer follows a filesystem-first approach, consuming append-only events and generating a normalized world model. It is designed to be idempotent, resumable, and resilient to intermittent node connectivity. ### Inputs - `/opt/homelab/events/`: Normalized JSON events. - `/opt/homelab/state/`: Deployment stage markers and internal observer checkpoint. - `/opt/homelab/logs/`: Detailed execution logs and diagnostics. - Repository Inventory: `inventory/topology.yaml` and `hosts/*/services.yaml`. ### World Model Output Generated under `/opt/homelab/world/`: - `nodes.json`: Current node availability, roles, and last seen timestamps. - `services.json`: Service health status and links to active incidents. - `deployments.json`: Tracking of active and historical deployment runs by `correlation_id`. - `incidents.json`: Correlated operational issues, including repeat failures and resolution status. - `runtime-summary.json`: High-level overview for dashboards and planner agents. ## Incident Lifecycle The observer implements lightweight incident correlation: 1. **Detection**: When a `service_unhealthy` or `healthcheck_failed` event is consumed, a new incident is created or an existing active incident for that service is updated. 2. **Correlation**: Multiple failure events for the same service on the same node are collapsed into a single incident, tracking the `occurrence_count`. 3. **Diagnostics**: Deployment failures (`deployment_failed`) automatically attach references to diagnostic files if present in the event payload. 4. **Resolution**: A `service_recovered` event for a service will transition any active incidents for that service to a `resolved` state. ### Example Incident JSON ```json { "inc-1715518800-saturn-mosquitto": { "id": "inc-1715518800-saturn-mosquitto", "node": "saturn", "service": "mosquitto", "status": "resolved", "severity": "error", "started_at": "2026-05-12T12:05:00Z", "last_occurrence": "2026-05-12T12:06:00Z", "occurrence_count": 2, "events": [ "2026-05-12T12:05:00Z", "2026-05-12T12:06:00Z" ], "correlation_id": "hc-1", "resolved_at": "2026-05-12T12:10:00Z" } } ``` ## Runtime Behavior ### Idempotency The observer processes events in order. If the world state is lost, deleting the checkpoint file (`/opt/homelab/state/observer_checkpoint.json`) will cause the observer to re-process all events and rebuild the world state. ### Resumability The observer tracks the last processed event file in its checkpoint. Upon restart, it continues from the next available event. ### Deployment Tracking Deployments are tracked via `correlation_id`. The observer synthesizes the start, end, and status of each deployment run, providing a clear history of changes to the environment.