homelab-codex-ws/inventory/topology.yaml
Oskar Kapala 039f9f7247 feat(piha): brain-watchdog — external watchdog for control-plane
Polls /summary on VPS over Tailscale every 60s; computes freshness
locally from last_update epoch (never trusts self-reported status).
Alerts via Telegram Bot API directly after 3 consecutive failures;
sends recovery message on heal. State (fail_count, alerted) persisted
to volume so debounce survives restarts.

- services/brain-watchdog/: Python service, no external deps (stdlib only)
- hosts/piha/runtime/brain-watchdog/: override with mem_limit 64m
- hosts/piha/services.yaml + inventory/topology.yaml: manifest entries

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-01 17:54:36 +02:00

79 lines
1.6 KiB
YAML

topology:
mesh: tailscale
git_provider: forgejo
deployment:
mode: pull
orchestrator: saturn
nodes:
saturn:
roles:
- control
- development
piha:
roles:
- infra
- monitoring
services:
- node-agent
- ha-diag-agent
- brain-watchdog
solaria:
roles:
- compute
- ai
vps:
roles:
- edge
- ingress
- control-plane
services:
# Repo-managed GitOps services (hosts/vps/services.yaml is authoritative)
- node-agent
- control-plane # executor, observer, supervisor, operator-ui
- node_exporter
- stability-agent
- npm # Nginx Proxy Manager — public ingress, TLS termination
- outline # Team wiki (outline + postgres + redis)
- joplin # Note sync server (joplin-server + postgres)
- ai-cluster # AI workers: codex-worker, openclaw, planner-worker,
# service-ops-worker, redis, mosquitto
chelsty-infra:
site: chelsty
roles:
- remote
- hypervisor
- infra
- staging
connectivity:
uplink: lte
intermittent: true
home_automation:
offline_operation_required: true
services:
- zigbee2mqtt
- mosquitto
coordinator:
model: SLZB-06U
connection: network
usb: false
chelsty-ha:
site: chelsty
roles:
- remote
- homeassistant
connectivity:
uplink: lte
intermittent: true
home_automation:
offline_operation_required: true
services:
- homeassistant