homelab-codex-ws/services/ha-diag-agent/README.md
Oskar Kapala ab8895d28b feat(ha-diag-agent): scaffold service with HA REST client and event emitter
- new per-host service, follows node-agent pattern
- 7 new HA event types defined (routing in supervisor — Phase 5)
- HeartbeatCheck as pipeline validator (pings /api/, emits ha_websocket_dead)
- service.yaml + host configs for piha (ken) and chelsty-infra (chelsty)
- test scaffolding with aiohttp/aiosqlite mocks (15/15 passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-29 12:26:34 +02:00

92 lines
2.5 KiB
Markdown

# ha-diag-agent
Per-host Home Assistant diagnostic agent. Polls HA REST API on a schedule,
emits structured events to `/opt/homelab/events/<node>/`, and exposes an
HTTP API for health checks and manual check triggers.
Follows the same event-pipeline pattern as `node-agent`: filesystem-first,
no direct supervisor integration, events processed by the VPS observer.
## Architecture
```
APScheduler (every CHECK_INTERVAL s)
└─ HeartbeatCheck → pings /api/, emits ha_websocket_dead on failure
[Phase 3: EntityUnavailableCheck, SystemHealthCheck, UpdateCheck, ...]
FastAPI (port 8087)
GET /health → liveness probe
POST /trigger/<check> → run a named check on demand
SQLite (/data/ha_diag.db)
entity_baseline → last-known entity states
check_history → per-check run log
alerts_sent → dedup gate for alert events
```
## Event Types
| Type | Severity | Trigger |
|------|----------|---------|
| `ha_websocket_dead` | error | HA /api/ unreachable |
| `ha_integration_failed` | error | Integration in error state |
| `ha_entity_unavailable_long` | warning | Entity unavailable > threshold |
| `ha_automation_failing` | warning | Automation last run errored |
| `ha_update_available` | info | HA or integration update pending |
| `ha_recorder_lag` | warning | Recorder write lag > threshold |
| `ha_system_health_degraded` | warning | System health check failed |
Event routing in supervisor (Phase 5) maps these to `notify` actions.
## Deployment
```bash
# 1. Create config on target node
ssh oskar@<node-ip>
mkdir -p /opt/homelab/config/ha-diag-agent /var/lib/ha-diag-agent
cat > /opt/homelab/config/ha-diag-agent/.env << 'EOF'
HA_URL=http://homeassistant.local:8123
HA_TOKEN=<long-lived-token>
NODE_NAME=piha
LOCATION_TAG=ken
CHECK_INTERVAL=60
EOF
# 2. Deploy
scripts/deploy/deploy.sh --service ha-diag-agent
# 3. Verify
docker ps --filter name=ha-diag-agent
curl http://localhost:8087/health
```
### chelsty-infra note
`chelsty-infra` runs docker-compose v1 (1.29.2). Use `docker-compose` (hyphenated):
```bash
docker-compose -f docker-compose.yml up -d --build
```
### HA long-lived token
In HA UI: Profile → Long-Lived Access Tokens → Create token.
## Running Tests
```bash
cd services/ha-diag-agent
pip install -e ".[dev]"
pytest tests/ -v
```
## Optional YAML config
Place `/opt/homelab/config/ha-diag-agent/ha-diag-agent.yaml` on the node.
Values there are defaults; env vars take priority.
```yaml
ha_url: http://homeassistant.local:8123
location_tag: ken
check_interval: 60
```