- shared aiohttp ClientSession in HAClient (Phase 1 Flag #2 fixed): make_session() factory, session injected at startup, closed on shutdown - Check.run() → list[CheckResult]: clean multi-event interface - first real diagnostic check: entity unavailable > 24h (INSERT OR IGNORE baseline preserves first-seen timestamp) - root cause grouping: emit ha_integration_failed instead of N entity events when ≥50% of integration's entities are unavailable (≥3 min) - alert deduplication via SQLite cooldown window (default 6h) - recovery clears baseline + dedup for immediate re-alert - configurable thresholds: duration, integration %, cooldown - 38 unit tests + 7 integration tests (42 pass, 3 skip w/o live HA) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
28 lines
825 B
Plaintext
28 lines
825 B
Plaintext
# ha-diag-agent environment variables
|
|
# Copy to /opt/homelab/config/ha-diag-agent/.env on the target node
|
|
|
|
# Home Assistant connection (required)
|
|
HA_URL=http://homeassistant.local:8123
|
|
HA_TOKEN=your-long-lived-token-here
|
|
HA_TIMEOUT=10.0
|
|
|
|
# Node identity
|
|
NODE_NAME=piha
|
|
LOCATION_TAG=ken
|
|
|
|
# Check intervals (seconds)
|
|
CHECK_INTERVAL=60 # heartbeat check
|
|
CHECK_INTERVAL_UNAVAILABLE=3600 # entity availability check (1h)
|
|
|
|
# Unavailable entities thresholds
|
|
UNAVAILABLE_THRESHOLD_HOURS=24 # alert after N hours unavailable
|
|
INTEGRATION_FAILURE_THRESHOLD_PCT=0.5 # fraction of integration entities
|
|
INTEGRATION_FAILURE_MIN_ENTITIES=3 # minimum count for integration event
|
|
ALERT_COOLDOWN_HOURS=6 # suppress re-alert within N hours
|
|
|
|
# API server
|
|
PORT=8087
|
|
|
|
# Logging: debug, info, warning, error
|
|
LOG_LEVEL=info
|