Oskar Kapala
|
3499b2f280
|
feat(ha-diag-agent): three REST diagnostic checks + Phase 3 flag fixes
New checks:
- SystemHealthCheck (15min interval): detects newly-failing HA
integrations via /api/system_health snapshot diff; transition-based
dedup (ok→error fires, sustained error silent, error→ok clears alert)
- UpdatesAvailableCheck (daily cron 09:00): per-update ha_update_available
events with 7-day dedup; release notes truncated at 2000 chars
- UpdatesDigestCheck (Sunday cron 09:00): single digest event with all
pending updates; weekly ISO-week dedup, independent of daily dedup key
- AutomationFailuresCheck (30min interval): detects automations with
N consecutive failures (default 3) via /api/trace/automation/<id>;
6h cooldown per automation
Phase 3 flag fixes:
- Flag #1 (since field): UnavailableEntitiesCheck now uses
min(state.last_changed, baseline.first_seen) as effective "since",
giving accurate duration when agent was offline at entity's first fail
- Flag #3 (registry cache): HAClient.get_entity_registry() caches
response in-process with configurable TTL (default 300s); avoids
repeated API calls across concurrent check cycles; invalidate_registry_cache()
for manual invalidation
Storage: system_health_snapshot table (component, last_status, last_seen_at,
payload) created automatically on next Storage.open() call
Config additions (all with defaults): entity_registry_cache_ttl=300,
system_health_check_interval=900, automation_check_interval=1800,
automation_failure_threshold=3, updates_check_hour=9,
updates_check_minute=0, updates_cooldown_days=7
Tests: 95 unit tests pass (49 new), 13 integration tests pass (9 new);
3 skipped (live-HA token not set in CI)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
2026-05-29 14:43:10 +02:00 |
|