homelab-codex-ws/docs/chelsty-stability-agent.md
oskar dc483ae31a docs(chelsty): update docs and topology for site/node split
- chelsty-runtime.md: references chelsty-infra and chelsty-ha nodes
- chelsty-stability-agent.md: scoped to chelsty-infra
- topology.yaml: chelsty monolith replaced with chelsty-infra + chelsty-ha
2026-05-20 14:23:57 +02:00

1.8 KiB

CHELSTY Stability Agent

The stability-agent on CHELSTY provides local observability and health monitoring for the node's services and infrastructure.

Purpose

It acts as a filesystem-first watchdog that detects anomalies in the local runtime environment without taking autonomous destructive actions (like restarts). It serves as the primary data source for node-level stability metrics.

Monitoring Scope

  • Docker Containers: Monitors all local containers. If a container is not in the running state, a containers_not_running event is generated.
  • Disk Usage: Monitors the root filesystem. Generates disk_usage_high events if usage exceeds the configured threshold.
  • Connectivity:
    • Checks if the Tailscale socket or interface is available.
    • Checks reachability of the local Mosquitto MQTT broker.
  • Zigbee2MQTT: Specifically tracks the presence and status of the Zigbee2MQTT service.

Storage and Integration

  • Heartbeat: Updated every cycle at /opt/homelab/state/stability-agent.heartbeat.
  • State Summary: A JSON summary of all latest checks at /opt/homelab/state/stability-agent.json.
  • Events: Append-only JSON lines at /opt/homelab/events/YYYY-MM-DD/chelsty-infra/events.jsonl.

Deployment

The service is deployed via Docker Compose on CHELSTY.

cd services/stability-agent
docker compose up -d

Configuration

Configuration is managed via environment variables in docker-compose.override.yml on the host.

Variable Description Default
STABILITY_CHECK_INTERVAL Seconds between checks 60
DISK_THRESHOLD_PCT Disk usage alert threshold 90
MQTT_HOST MQTT broker hostname mosquitto
MQTT_PORT MQTT broker port 1883