2.8 KiB
2.8 KiB
Stability Agent Multi-Node Rollout
Architecture Summary
The stability-agent is a lightweight Python service that monitors node health (disk, Docker containers, Tailscale, MQTT) and publishes state to a central Redis instance running on PIHA.
- Source:
services/stability-agent - State Path:
/opt/homelab/state - Events Path:
/opt/homelab/events - Redis Target:
100.108.208.3:6379(PIHA)
Why UI only showed CHELSTY
Previously, the stability-agent had NODE_NAME defaulted to chelsty and was only deployed there. The Agent System UI materializer on PIHA filters nodes based on the Redis keys homelab:nodes:<NODE_NAME>. Without other agents publishing their specific NODE_NAME, the UI remained limited to the single active node.
Deployment Commands
Use the helper script to generate commands:
./scripts/deploy/deploy-stability-agent.sh <node-name>
PIHA
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=piha REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
CHELSTY
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=chelsty REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
SOLARIA
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=solaria REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
VPS
cd ~/homelab-codex-ws
git pull
cd services/stability-agent
NODE_NAME=vps REDIS_HOST=100.108.208.3 REDIS_PORT=6379 REDIS_ENABLED=true docker compose up -d --build --force-recreate
SATURN (Optional)
Saturn is the orchestrator and can optionally run the stability-agent. If deployed, follow the same pattern with NODE_NAME=saturn.
Verification (on PIHA)
Verify Redis keys:
docker exec agent-system-redis redis-cli KEYS 'homelab:nodes:*'
docker exec agent-system-redis redis-cli HGETALL homelab:nodes:<node-name>
Verify Web UI backend:
curl -s http://127.0.0.1:18180/nodes
curl -k https://agents.okit.pl/nodes
Troubleshooting
- Redis empty after compose down: The
agent-system-redison PIHA uses transient storage if not configured with a volume. If it restarts, agents must republish their state (they do this automatically everyCHECK_INTERVAL). - Secrets:
.envfiles and local secrets are not committed to the repo. EnsureMQTT_HOSTand other specific secrets are set via overrides if needed. - Telegram: Telegram bot notifications can remain disabled if
TELEGRAM_BOT_TOKENis absent. - Docker Socket: If the agent reports
unavailablefor Docker, ensure/var/run/docker.sockis mounted and the user has permissions.