Polls /summary on VPS over Tailscale every 60s; computes freshness
locally from last_update epoch (never trusts self-reported status).
Alerts via Telegram Bot API directly after 3 consecutive failures;
sends recovery message on heal. State (fail_count, alerted) persisted
to volume so debounce survives restarts.
- services/brain-watchdog/: Python service, no external deps (stdlib only)
- hosts/piha/runtime/brain-watchdog/: override with mem_limit 64m
- hosts/piha/services.yaml + inventory/topology.yaml: manifest entries
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
services/agent-system/runtime-materializer/materializer.py:
- Add materialize_from_api() that fetches all world-state endpoints
from the control-plane HTTP API (CONTROL_PLANE_URL env var)
- When CONTROL_PLANE_URL is set, use API as source of truth instead of Redis
- Redis path preserved as fallback for backward compat
hosts/piha/runtime/agent-system/docker-compose.override.yml (new):
- Inject CONTROL_PLANE_URL=http://100.95.58.48:18180 for runtime-materializer
- piha webui /snapshot now mirrors VPS observer output (clean, ghost-free)
Root cause: materializer read from Redis which held 80 stale service entries
with hash-prefixed ghost keys (e.g. 0ccb8a88e079_control-plane-supervisor).
Redis is never updated by the current observer pipeline; the control-plane API
is the single authoritative world-state source.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
100.108.208.3 is piha's Tailscale IP (piha hosts Forgejo+Redis).
VPS's actual Tailscale IP is 100.95.58.48. All three node-agent
overrides were pointing at piha itself, causing containers to SSH
to their own host and fail auth.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Nodes ship events to VPS via rsync+SSH. The container runs as root
and uses the default SSH identity, which must be at /root/.ssh/.
Mount /home/oskar/.ssh from the host read-only so the existing
authorized key is available inside the container.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- piha: NODE_TYPE=sd_card (rate-limited docker prune, once per day)
- solaria: NODE_TYPE=ai_node (dangling+containers+build cache; never -a to preserve Ollama images)
- chelsty-infra: NODE_TYPE=lte_node (NO cleanup, events-only)
- All three: VPS_EVENTS_HOST set for event shipping via rsync+SSH
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>