Oskar Kapala
6953815f41
feat(ha-diag-agent): add piha deploy config
...
- hosts/piha/runtime/ha-diag-agent/docker-compose.override.yml: mem_limit
128m, hardcoded events volume (/opt/homelab/events/piha:/events) to avoid
${NODE_NAME} shell-expansion issue in deploy-node.sh
- services/ha-diag-agent/env.example: per-host HA_URL comments (piha vs
chelsty-infra tailscale), HA_TOKEN source note
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 19:19:27 +02:00
Oskar Kapala
20f6761a67
feat(ha-diag-agent): UnavailableEntitiesCheck with root cause dedup
...
- shared aiohttp ClientSession in HAClient (Phase 1 Flag #2 fixed):
make_session() factory, session injected at startup, closed on shutdown
- Check.run() → list[CheckResult]: clean multi-event interface
- first real diagnostic check: entity unavailable > 24h
(INSERT OR IGNORE baseline preserves first-seen timestamp)
- root cause grouping: emit ha_integration_failed instead of N entity
events when ≥50% of integration's entities are unavailable (≥3 min)
- alert deduplication via SQLite cooldown window (default 6h)
- recovery clears baseline + dedup for immediate re-alert
- configurable thresholds: duration, integration %, cooldown
- 38 unit tests + 7 integration tests (42 pass, 3 skip w/o live HA)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:41:55 +02:00
Oskar Kapala
ab8895d28b
feat(ha-diag-agent): scaffold service with HA REST client and event emitter
...
- new per-host service, follows node-agent pattern
- 7 new HA event types defined (routing in supervisor — Phase 5)
- HeartbeatCheck as pipeline validator (pings /api/, emits ha_websocket_dead)
- service.yaml + host configs for piha (ken) and chelsty-infra (chelsty)
- test scaffolding with aiohttp/aiosqlite mocks (15/15 passing)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:26:34 +02:00