Polls /summary on VPS over Tailscale every 60s; computes freshness locally from last_update epoch (never trusts self-reported status). Alerts via Telegram Bot API directly after 3 consecutive failures; sends recovery message on heal. State (fail_count, alerted) persisted to volume so debounce survives restarts. - services/brain-watchdog/: Python service, no external deps (stdlib only) - hosts/piha/runtime/brain-watchdog/: override with mem_limit 64m - hosts/piha/services.yaml + inventory/topology.yaml: manifest entries Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
31 lines
629 B
YAML
31 lines
629 B
YAML
services:
|
|
brain-watchdog:
|
|
build: .
|
|
container_name: brain-watchdog
|
|
restart: unless-stopped
|
|
|
|
env_file:
|
|
- /opt/homelab/config/brain-watchdog/.env
|
|
|
|
volumes:
|
|
- brain_watchdog_data:/data
|
|
|
|
healthcheck:
|
|
test:
|
|
- "CMD"
|
|
- "python"
|
|
- "-c"
|
|
- |
|
|
import os, time, json, sys
|
|
p = '/data/state.json'
|
|
if not os.path.exists(p): sys.exit(1)
|
|
age = time.time() - os.path.getmtime(p)
|
|
sys.exit(0 if age < 300 else 1)
|
|
interval: 1m
|
|
timeout: 10s
|
|
retries: 3
|
|
start_period: 30s
|
|
|
|
volumes:
|
|
brain_watchdog_data:
|