|
|
||
|---|---|---|
| .. | ||
| src | ||
| docker-compose.yml | ||
| Dockerfile | ||
| env.example | ||
| healthcheck.sh | ||
| README.md | ||
| service.yaml | ||
Stability Agent
A lightweight filesystem-first watchdog and observer agent for CHELSTY.
Features
- Continuous Monitoring: Runs as a background service.
- Docker Inspection: Checks container status via read-only Docker socket.
- Disk Usage: Monitors local disk utilization.
- Tailscale Check: Verifies Tailscale availability.
- MQTT Reachability: Checks connectivity to the local MQTT broker.
- Zigbee2MQTT Monitoring: Specifically monitors the Zigbee2MQTT container.
- Redis Publishing: (Optional) Publishes runtime state and events to a central Redis server.
- Event Logging: Writes append-only JSON events to
/opt/homelab/events/YYYY-MM-DD/chelsty/. - State Reporting: Writes heartbeat and status summary to
/opt/homelab/state/.
Configuration
Environment variables:
STABILITY_CHECK_INTERVAL: Interval between checks in seconds (default: 60).DISK_THRESHOLD_PCT: Disk usage percentage to trigger warning (default: 90).MQTT_HOST: Hostname or IP of the MQTT broker to check.MQTT_PORT: Port of the MQTT broker (default: 1883).REDIS_HOST: Hostname or IP of the Redis server (e.g., PIHA at 100.108.208.3).REDIS_PORT: Port of the Redis server (default: 6379).REDIS_ENABLED: Whether to enable Redis publishing (default: true if REDIS_HOST is set).NODE_NAME: Name of the current node (default: chelsty).
Verification
You can verify the Redis publishing using redis-cli:
# Check node state
redis-cli -h 100.108.208.3 HGETALL homelab:nodes:chelsty
# Check service discovery
redis-cli -h 100.108.208.3 HGETALL homelab:services:chelsty:stability-agent
# Check event stream
redis-cli -h 100.108.208.3 XRANGE homelab:events - +
Safety
- No automatic restarts are performed.
- Read-only access to Docker socket.
- No configuration mutation.
- No secrets stored in the repository.
Event Schema
Events are written as JSON lines with the following fields:
id: Unique event UUID.timestamp: ISO 8601 timestamp (UTC).node:chelsty.source:stability-agent.type: Type of event (e.g.,disk_usage_high,containers_not_running).severity:info,warning, orerror.message: Human-readable description.details: Object containing specific check results.