Commit graph

3 commits

Author SHA1 Message Date
Oskar Kapala 98437d46b2 test(control-plane): atomic write and resilient loader coverage
11 new test cases in test_state_reliability.py covering:
- atomic_write_json: produces valid JSON, no .tmp left behind, overwrites,
  works with nested structures
- _load_actual_state: returns False on empty / truncated file, returns True
  on valid files, preserves last-known-good state across a parse failure
- reconcile: empty/truncated services.json or incidents.json generates zero
  actions (skip-cycle semantics proven end-to-end)
- healthy service with valid world state generates no spurious action

All 32 tests (11 new + 21 existing) pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 12:27:05 +02:00
Oskar Kapala 52607a7cdd feat(control-plane): shadow_mode for HA event auto-actions + deploy docs
- HA_DIAG_SHADOW_MODE env flag in supervisor (default true)
- shadow_mode downgrades container_restart actions to alert_only with
  [SHADOW MODE] note; same action_id and 30-min cooldown apply
- alert_only events unaffected (always routed normally)
- 3 new tests: shadow on/off for ha_websocket_dead, alert-only unaffected
- DEPLOY.md with token gen, per-host config, verification, 48h observation,
  production-mode enablement, rollback
- README.md updated with shadow mode flag summary and DEPLOY.md link

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 17:12:33 +02:00
Oskar Kapala bf1415e4c1 feat(control-plane): route ha-diag-agent events through supervisor
- 8 HA event types mapped to existing action types
- ha_websocket_dead → container_restart (homeassistant), 30-min cooldown
- 6 events → alert_only (entity_unavailable, integration_failed,
  automation_failing, update_available, recorder_lag,
  system_health_degraded), 1-hour cooldown
- ha_websocket_recovered → cancels matching pending container_restart
- state-aware suppression: skip HA events when homeassistant has an
  active containers_not_running incident < 5 min ago (avoids alert
  storms during HA restarts/updates)
- location_tag preserved through action pipeline for per-house
  telegram alerts
- executor: alert_only acknowledged as no-op success
- 18 tests covering all 8 event types, suppression, cooldown,
  dedup, location_tag, recovery cancellation
- CLAUDE.md: supervisor event routing table added

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 15:59:23 +02:00