2.5 KiB
Stability Agent Multi-Node Rollout
Architecture Summary
The stability-agent is a lightweight Python service that monitors node health (disk, Docker containers, Tailscale, MQTT) and publishes state to a central Redis instance running on PIHA.
- Source:
services/stability-agent - State Path:
/opt/homelab/state - Events Path:
/opt/homelab/events - Redis Target:
100.108.208.3:6379(PIHA)
Why UI only showed CHELSTY
Previously, the stability-agent had NODE_NAME defaulted to chelsty and was only deployed there. The Agent System UI materializer on PIHA filters nodes based on the Redis keys homelab:nodes:<NODE_NAME>. Without other agents publishing their specific NODE_NAME, the UI remained limited to the single active node.
Deployment
Use the helper script to deploy or generate commands. The script uses explicit Tailscale IPs for remote targets (piha, chelsty, vps) and runs locally for solaria.
# Print commands
./scripts/deploy/deploy-stability-agent.sh <node-name>
# Deploy via SSH (executes ssh oskar@<ip>)
./scripts/deploy/deploy-stability-agent.sh <node-name> --ssh
Manual Steps per Node
The manual steps are encapsulated in services/stability-agent/deploy-local.sh. On the target node:
cd /home/oskar/homelab-codex-ws
git fetch origin
git checkout master
git pull origin master
cd services/stability-agent
./deploy-local.sh <node-name>
Verification
Fleet Overview
Run the verification script from any node with redis-cli access:
./scripts/deploy/verify-agent-fleet.sh
Redis Inspection (on PIHA)
docker exec agent-system-redis redis-cli KEYS 'homelab:nodes:*'
docker exec agent-system-redis redis-cli HGETALL homelab:nodes:<node-name>
Verify Web UI backend:
curl -s http://127.0.0.1:18180/nodes
curl -k https://agents.okit.pl/nodes
Troubleshooting
- Redis empty after compose down: The
agent-system-redison PIHA uses transient storage if not configured with a volume. If it restarts, agents must republish their state (they do this automatically everyCHECK_INTERVAL). - Secrets:
.envfiles and local secrets are not committed to the repo. EnsureMQTT_HOSTand other specific secrets are set via overrides if needed. - Telegram: Telegram bot notifications can remain disabled if
TELEGRAM_BOT_TOKENis absent. - Docker Socket: If the agent reports
unavailablefor Docker, ensure/var/run/docker.sockis mounted and the user has permissions.