homelab-codex-ws/docs/stability-agent-rollout.md

76 lines
2.8 KiB
Markdown

# Stability Agent Multi-Node Rollout
## Architecture Summary
The `stability-agent` is a lightweight Python service that monitors node health (disk, Docker containers, Tailscale, MQTT) and publishes state to a central Redis instance running on **PIHA**.
- **Source**: `services/stability-agent`
- **State Path**: `/opt/homelab/state`
- **Events Path**: `/opt/homelab/events`
- **Redis Target**: `100.108.208.3:6379` (PIHA)
## Why UI only showed CHELSTY
Previously, the `stability-agent` had `NODE_NAME` defaulted to `chelsty` and was only deployed there. The Agent System UI materializer on PIHA filters nodes based on the Redis keys `homelab:nodes:<NODE_NAME>`. Without other agents publishing their specific `NODE_NAME`, the UI remained limited to the single active node.
## Deployment Commands
Use the helper script to generate commands:
```bash
./scripts/deploy/deploy-stability-agent.sh <node-name>
```
### PIHA
```bash
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=piha REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
```
### CHELSTY
```bash
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=chelsty REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
```
### SOLARIA
```bash
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=solaria REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
```
### VPS
```bash
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=vps REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
```
### SATURN (Optional)
Saturn is the orchestrator and can optionally run the stability-agent. If deployed, follow the same pattern with `NODE_NAME=saturn`.
## Verification (on PIHA)
Verify Redis keys:
```bash
docker exec agent-system-redis redis-cli KEYS 'homelab:nodes:*'
docker exec agent-system-redis redis-cli HGETALL homelab:nodes:<node-name>
```
Verify Web UI backend:
```bash
curl -s http://127.0.0.1:18180/nodes
curl -k https://agents.okit.pl/nodes
```
## Troubleshooting
- **Redis empty after compose down**: The `agent-system-redis` on PIHA uses transient storage if not configured with a volume. If it restarts, agents must republish their state (they do this automatically every `CHECK_INTERVAL`).
- **Secrets**: `.env` files and local secrets are not committed to the repo. Ensure `MQTT_HOST` and other specific secrets are set via overrides if needed.
- **Telegram**: Telegram bot notifications can remain disabled if `TELEGRAM_BOT_TOKEN` is absent.
- **Docker Socket**: If the agent reports `unavailable` for Docker, ensure `/var/run/docker.sock` is mounted and the user has permissions.