homelab-codex-ws/services/stability-agent/README.md

64 lines
2.3 KiB
Markdown
Raw Normal View History

2026-05-15 18:51:45 +02:00
### Stability Agent
A lightweight filesystem-first watchdog and observer agent for CHELSTY.
#### Features
* **Continuous Monitoring**: Runs as a background service.
* **Docker Inspection**: Checks container status via read-only Docker socket.
* **Disk Usage**: Monitors local disk utilization.
* **Tailscale Check**: Verifies Tailscale availability.
* **MQTT Reachability**: Checks connectivity to the local MQTT broker.
* **Zigbee2MQTT Monitoring**: Specifically monitors the Zigbee2MQTT container.
2026-05-15 22:52:12 +02:00
* **Redis Publishing**: (Optional) Publishes runtime state and events to a central Redis server.
2026-05-15 18:51:45 +02:00
* **Event Logging**: Writes append-only JSON events to `/opt/homelab/events/YYYY-MM-DD/chelsty/`.
* **State Reporting**: Writes heartbeat and status summary to `/opt/homelab/state/`.
#### Configuration
Environment variables:
* `STABILITY_CHECK_INTERVAL`: Interval between checks in seconds (default: 60).
* `DISK_THRESHOLD_PCT`: Disk usage percentage to trigger warning (default: 90).
* `MQTT_HOST`: Hostname or IP of the MQTT broker to check.
* `MQTT_PORT`: Port of the MQTT broker (default: 1883).
2026-05-15 22:52:12 +02:00
* `REDIS_HOST`: Hostname or IP of the Redis server (e.g., PIHA at 100.108.208.3).
* `REDIS_PORT`: Port of the Redis server (default: 6379).
* `REDIS_ENABLED`: Whether to enable Redis publishing (default: true if REDIS_HOST is set).
* `NODE_NAME`: Name of the current node (default: chelsty).
#### Verification
You can verify the Redis publishing using `redis-cli`:
```bash
# Check node state
redis-cli -h 100.108.208.3 HGETALL homelab:nodes:chelsty
# Check service discovery
redis-cli -h 100.108.208.3 HGETALL homelab:services:chelsty:stability-agent
# Check event stream
redis-cli -h 100.108.208.3 XRANGE homelab:events - +
```
2026-05-15 18:51:45 +02:00
#### Safety
* No automatic restarts are performed.
* Read-only access to Docker socket.
* No configuration mutation.
* No secrets stored in the repository.
#### Event Schema
Events are written as JSON lines with the following fields:
* `id`: Unique event UUID.
* `timestamp`: ISO 8601 timestamp (UTC).
* `node`: `chelsty`.
* `source`: `stability-agent`.
* `type`: Type of event (e.g., `disk_usage_high`, `containers_not_running`).
* `severity`: `info`, `warning`, or `error`.
* `message`: Human-readable description.
* `details`: Object containing specific check results.