63 lines
2.5 KiB
Markdown
63 lines
2.5 KiB
Markdown
# Stability Agent Multi-Node Rollout
|
|
|
|
## Architecture Summary
|
|
The `stability-agent` is a lightweight Python service that monitors node health (disk, Docker containers, Tailscale, MQTT) and publishes state to a central Redis instance running on **PIHA**.
|
|
|
|
- **Source**: `services/stability-agent`
|
|
- **State Path**: `/opt/homelab/state`
|
|
- **Events Path**: `/opt/homelab/events`
|
|
- **Redis Target**: `100.108.208.3:6379` (PIHA)
|
|
|
|
## Why UI only showed CHELSTY
|
|
Previously, the `stability-agent` had `NODE_NAME` defaulted to `chelsty` and was only deployed there. The Agent System UI materializer on PIHA filters nodes based on the Redis keys `homelab:nodes:<NODE_NAME>`. Without other agents publishing their specific `NODE_NAME`, the UI remained limited to the single active node.
|
|
|
|
## Deployment
|
|
|
|
Use the helper script to deploy or generate commands. The script uses explicit Tailscale IPs for remote targets (piha, chelsty, vps) and runs locally for solaria.
|
|
|
|
```bash
|
|
# Print commands
|
|
./scripts/deploy/deploy-stability-agent.sh <node-name>
|
|
|
|
# Deploy via SSH (executes ssh oskar@<ip>)
|
|
./scripts/deploy/deploy-stability-agent.sh <node-name> --ssh
|
|
```
|
|
|
|
### Manual Steps per Node
|
|
The manual steps are encapsulated in `services/stability-agent/deploy-local.sh`. On the target node:
|
|
```bash
|
|
cd /home/oskar/homelab-codex-ws
|
|
git fetch origin
|
|
git checkout master
|
|
git pull origin master
|
|
cd services/stability-agent
|
|
./deploy-local.sh <node-name>
|
|
```
|
|
|
|
## Verification
|
|
|
|
### Fleet Overview
|
|
Run the verification script from any node with `redis-cli` access:
|
|
```bash
|
|
./scripts/deploy/verify-agent-fleet.sh
|
|
```
|
|
|
|
### Redis Inspection (on PIHA)
|
|
```bash
|
|
docker exec agent-system-redis redis-cli KEYS 'homelab:nodes:*'
|
|
docker exec agent-system-redis redis-cli HGETALL homelab:nodes:<node-name>
|
|
```
|
|
|
|
Verify Web UI backend:
|
|
```bash
|
|
curl -s http://127.0.0.1:18180/nodes
|
|
curl -k https://agents.okit.pl/nodes
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
- **Redis empty after compose down**: The `agent-system-redis` on PIHA uses transient storage if not configured with a volume. If it restarts, agents must republish their state (they do this automatically every `CHECK_INTERVAL`).
|
|
- **Secrets**: `.env` files and local secrets are not committed to the repo. Ensure `MQTT_HOST` and other specific secrets are set via overrides if needed.
|
|
- **Telegram**: Telegram bot notifications can remain disabled if `TELEGRAM_BOT_TOKEN` is absent.
|
|
- **Docker Socket**: If the agent reports `unavailable` for Docker, ensure `/var/run/docker.sock` is mounted and the user has permissions.
|