43 lines
1.8 KiB
Markdown
43 lines
1.8 KiB
Markdown
|
|
### CHELSTY Stability Agent
|
||
|
|
|
||
|
|
The stability-agent on CHELSTY provides local observability and health monitoring for the node's services and infrastructure.
|
||
|
|
|
||
|
|
#### Purpose
|
||
|
|
|
||
|
|
It acts as a filesystem-first watchdog that detects anomalies in the local runtime environment without taking autonomous destructive actions (like restarts). It serves as the primary data source for node-level stability metrics.
|
||
|
|
|
||
|
|
#### Monitoring Scope
|
||
|
|
|
||
|
|
* **Docker Containers**: Monitors all local containers. If a container is not in the `running` state, a `containers_not_running` event is generated.
|
||
|
|
* **Disk Usage**: Monitors the root filesystem. Generates `disk_usage_high` events if usage exceeds the configured threshold.
|
||
|
|
* **Connectivity**:
|
||
|
|
* Checks if the Tailscale socket or interface is available.
|
||
|
|
* Checks reachability of the local Mosquitto MQTT broker.
|
||
|
|
* **Zigbee2MQTT**: Specifically tracks the presence and status of the Zigbee2MQTT service.
|
||
|
|
|
||
|
|
#### Storage and Integration
|
||
|
|
|
||
|
|
* **Heartbeat**: Updated every cycle at `/opt/homelab/state/stability-agent.heartbeat`.
|
||
|
|
* **State Summary**: A JSON summary of all latest checks at `/opt/homelab/state/stability-agent.json`.
|
||
|
|
* **Events**: Append-only JSON lines at `/opt/homelab/events/YYYY-MM-DD/chelsty/events.jsonl`.
|
||
|
|
|
||
|
|
#### Deployment
|
||
|
|
|
||
|
|
The service is deployed via Docker Compose on CHELSTY.
|
||
|
|
|
||
|
|
```bash
|
||
|
|
cd services/stability-agent
|
||
|
|
docker compose up -d
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Configuration
|
||
|
|
|
||
|
|
Configuration is managed via environment variables in `docker-compose.override.yml` on the host.
|
||
|
|
|
||
|
|
| Variable | Description | Default |
|
||
|
|
|----------|-------------|---------|
|
||
|
|
| `STABILITY_CHECK_INTERVAL` | Seconds between checks | `60` |
|
||
|
|
| `DISK_THRESHOLD_PCT` | Disk usage alert threshold | `90` |
|
||
|
|
| `MQTT_HOST` | MQTT broker hostname | `mosquitto` |
|
||
|
|
| `MQTT_PORT` | MQTT broker port | `1883` |
|