homelab-codex-ws/services/ha-diag-agent/env.example
Oskar Kapala 20f6761a67 feat(ha-diag-agent): UnavailableEntitiesCheck with root cause dedup
- shared aiohttp ClientSession in HAClient (Phase 1 Flag #2 fixed):
  make_session() factory, session injected at startup, closed on shutdown
- Check.run() → list[CheckResult]: clean multi-event interface
- first real diagnostic check: entity unavailable > 24h
  (INSERT OR IGNORE baseline preserves first-seen timestamp)
- root cause grouping: emit ha_integration_failed instead of N entity
  events when ≥50% of integration's entities are unavailable (≥3 min)
- alert deduplication via SQLite cooldown window (default 6h)
- recovery clears baseline + dedup for immediate re-alert
- configurable thresholds: duration, integration %, cooldown
- 38 unit tests + 7 integration tests (42 pass, 3 skip w/o live HA)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 13:41:55 +02:00

28 lines
825 B
Plaintext

# ha-diag-agent environment variables
# Copy to /opt/homelab/config/ha-diag-agent/.env on the target node
# Home Assistant connection (required)
HA_URL=http://homeassistant.local:8123
HA_TOKEN=your-long-lived-token-here
HA_TIMEOUT=10.0
# Node identity
NODE_NAME=piha
LOCATION_TAG=ken
# Check intervals (seconds)
CHECK_INTERVAL=60 # heartbeat check
CHECK_INTERVAL_UNAVAILABLE=3600 # entity availability check (1h)
# Unavailable entities thresholds
UNAVAILABLE_THRESHOLD_HOURS=24 # alert after N hours unavailable
INTEGRATION_FAILURE_THRESHOLD_PCT=0.5 # fraction of integration entities
INTEGRATION_FAILURE_MIN_ENTITIES=3 # minimum count for integration event
ALERT_COOLDOWN_HOURS=6 # suppress re-alert within N hours
# API server
PORT=8087
# Logging: debug, info, warning, error
LOG_LEVEL=info