2.4 KiB
2.4 KiB
Stability Agent
A lightweight filesystem-first watchdog and observer agent for homelab nodes.
Features
- Continuous Monitoring: Runs as a background service.
- Docker Inspection: Checks container status via read-only Docker socket (optional).
- Disk Usage: Monitors local disk utilization.
- Tailscale Check: Verifies Tailscale availability (optional).
- MQTT Reachability: Checks connectivity to a configured MQTT broker (optional).
- Redis Publishing: Publishes runtime state and events to a central Redis server (PIHA).
- Event Logging: Writes append-only JSON events to
/opt/homelab/events/YYYY-MM-DD/<NODE_NAME>/. - State Reporting: Writes heartbeat and status summary to
/opt/homelab/state/.
Deployment
Use the deployment helper script:
./scripts/deploy/deploy-stability-agent.sh <NODE_NAME>
Configuration
Environment variables:
STABILITY_CHECK_INTERVAL: Interval between checks in seconds (default: 60).DISK_THRESHOLD_PCT: Disk usage percentage to trigger warning (default: 90).MQTT_HOST: Hostname or IP of the MQTT broker to check.MQTT_PORT: Port of the MQTT broker (default: 1883).REDIS_HOST: Hostname or IP of the Redis server (e.g., PIHA at 100.108.208.3).REDIS_PORT: Port of the Redis server (default: 6379).REDIS_ENABLED: Whether to enable Redis publishing (default: true if REDIS_HOST is set).NODE_NAME: Name of the current node (default: chelsty).
Verification
You can verify the Redis publishing using redis-cli:
# Check node state
redis-cli -h 100.108.208.3 HGETALL homelab:nodes:<NODE_NAME>
# Check service discovery
redis-cli -h 100.108.208.3 HGETALL homelab:services:<NODE_NAME>:stability-agent
# Check event stream
redis-cli -h 100.108.208.3 XRANGE homelab:events - +
Safety
- No automatic restarts are performed.
- Read-only access to Docker socket.
- No configuration mutation.
- No secrets stored in the repository.
Event Schema
Events are written as JSON lines with the following fields:
id: Unique event UUID.timestamp: ISO 8601 timestamp (UTC).node:<NODE_NAME>.source:stability-agent.type: Type of event (e.g.,disk_usage_high,containers_not_running).severity:info,warning, orerror.message: Human-readable description.details: Object containing specific check results.