2.3 KiB
2.3 KiB
Stability Agent Multi-Node Rollout
Architecture Summary
The stability-agent is a lightweight Python service that monitors node health (disk, Docker containers, Tailscale, MQTT) and publishes state to a central Redis instance running on PIHA.
- Source:
services/stability-agent - State Path:
/opt/homelab/state - Events Path:
/opt/homelab/events - Redis Target:
100.108.208.3:6379(PIHA)
Why UI only showed CHELSTY
Previously, the stability-agent had NODE_NAME defaulted to chelsty and was only deployed there. The Agent System UI materializer on PIHA filters nodes based on the Redis keys homelab:nodes:<NODE_NAME>. Without other agents publishing their specific NODE_NAME, the UI remained limited to the single active node.
Deployment
Use the helper script to deploy or generate commands:
# Print commands
./scripts/deploy/deploy-stability-agent.sh <node-name>
# Deploy via SSH (requires SSH access to the node)
./scripts/deploy/deploy-stability-agent.sh <node-name> --ssh
Manual Steps per Node
The manual steps are encapsulated in services/stability-agent/deploy-local.sh. On the target node:
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
./deploy-local.sh
Verification
Fleet Overview
Run the verification script from any node with redis-cli access:
./scripts/deploy/verify-agent-fleet.sh
Redis Inspection (on PIHA)
docker exec agent-system-redis redis-cli KEYS 'homelab:nodes:*'
docker exec agent-system-redis redis-cli HGETALL homelab:nodes:<node-name>
Verify Web UI backend:
curl -s http://127.0.0.1:18180/nodes
curl -k https://agents.okit.pl/nodes
Troubleshooting
- Redis empty after compose down: The
agent-system-redison PIHA uses transient storage if not configured with a volume. If it restarts, agents must republish their state (they do this automatically everyCHECK_INTERVAL). - Secrets:
.envfiles and local secrets are not committed to the repo. EnsureMQTT_HOSTand other specific secrets are set via overrides if needed. - Telegram: Telegram bot notifications can remain disabled if
TELEGRAM_BOT_TOKENis absent. - Docker Socket: If the agent reports
unavailablefor Docker, ensure/var/run/docker.sockis mounted and the user has permissions.