2.9 KiB
Observer Runtime
The Observer Runtime is a lightweight agent responsible for synthesizing the operational world state of the homelab from raw events, logs, and state files.
Architecture
The observer follows a filesystem-first approach, consuming append-only events and generating a normalized world model. It is designed to be idempotent, resumable, and resilient to intermittent node connectivity.
Inputs
/opt/homelab/events/: Normalized JSON events./opt/homelab/state/: Deployment stage markers and internal observer checkpoint./opt/homelab/logs/: Detailed execution logs and diagnostics.- Repository Inventory:
inventory/topology.yamlandhosts/*/services.yaml.
World Model Output
Generated under /opt/homelab/world/:
nodes.json: Current node availability, roles, and last seen timestamps.services.json: Service health status and links to active incidents.deployments.json: Tracking of active and historical deployment runs bycorrelation_id.incidents.json: Correlated operational issues, including repeat failures and resolution status.runtime-summary.json: High-level overview for dashboards and planner agents.
Incident Lifecycle
The observer implements lightweight incident correlation:
- Detection: When a
service_unhealthyorhealthcheck_failedevent is consumed, a new incident is created or an existing active incident for that service is updated. - Correlation: Multiple failure events for the same service on the same node are collapsed into a single incident, tracking the
occurrence_count. - Diagnostics: Deployment failures (
deployment_failed) automatically attach references to diagnostic files if present in the event payload. - Resolution: A
service_recoveredevent for a service will transition any active incidents for that service to aresolvedstate.
Example Incident JSON
{
"inc-1715518800-saturn-mosquitto": {
"id": "inc-1715518800-saturn-mosquitto",
"node": "saturn",
"service": "mosquitto",
"status": "resolved",
"severity": "error",
"started_at": "2026-05-12T12:05:00Z",
"last_occurrence": "2026-05-12T12:06:00Z",
"occurrence_count": 2,
"events": [
"2026-05-12T12:05:00Z",
"2026-05-12T12:06:00Z"
],
"correlation_id": "hc-1",
"resolved_at": "2026-05-12T12:10:00Z"
}
}
Runtime Behavior
Idempotency
The observer processes events in order. If the world state is lost, deleting the checkpoint file (/opt/homelab/state/observer_checkpoint.json) will cause the observer to re-process all events and rebuild the world state.
Resumability
The observer tracks the last processed event file in its checkpoint. Upon restart, it continues from the next available event.
Deployment Tracking
Deployments are tracked via correlation_id. The observer synthesizes the start, end, and status of each deployment run, providing a clear history of changes to the environment.