homelab-codex-ws/docs/chelsty-stability-agent.md
2026-05-15 18:51:45 +02:00

1.8 KiB

CHELSTY Stability Agent

The stability-agent on CHELSTY provides local observability and health monitoring for the node's services and infrastructure.

Purpose

It acts as a filesystem-first watchdog that detects anomalies in the local runtime environment without taking autonomous destructive actions (like restarts). It serves as the primary data source for node-level stability metrics.

Monitoring Scope

  • Docker Containers: Monitors all local containers. If a container is not in the running state, a containers_not_running event is generated.
  • Disk Usage: Monitors the root filesystem. Generates disk_usage_high events if usage exceeds the configured threshold.
  • Connectivity:
    • Checks if the Tailscale socket or interface is available.
    • Checks reachability of the local Mosquitto MQTT broker.
  • Zigbee2MQTT: Specifically tracks the presence and status of the Zigbee2MQTT service.

Storage and Integration

  • Heartbeat: Updated every cycle at /opt/homelab/state/stability-agent.heartbeat.
  • State Summary: A JSON summary of all latest checks at /opt/homelab/state/stability-agent.json.
  • Events: Append-only JSON lines at /opt/homelab/events/YYYY-MM-DD/chelsty/events.jsonl.

Deployment

The service is deployed via Docker Compose on CHELSTY.

cd services/stability-agent
docker compose up -d

Configuration

Configuration is managed via environment variables in docker-compose.override.yml on the host.

Variable Description Default
STABILITY_CHECK_INTERVAL Seconds between checks 60
DISK_THRESHOLD_PCT Disk usage alert threshold 90
MQTT_HOST MQTT broker hostname mosquitto
MQTT_PORT MQTT broker port 1883