homelab-codex-ws

oskar/homelab-codex-ws

Fork 0

Commit graph

Author	SHA1	Message	Date
Oskar Kapala	c255a021d1	fix(observer): quarantine malformed event files to prevent processing wedge Was: malformed event (bad JSON / truncated / corrupted bytes) wedged the node's checkpoint forever — every cycle re-tried, logged, never advanced past the bad file; all subsequent good events for that node lost. Now: first parse failure -> atomic os.replace to STATE_DIR/observer_failed_events/<node>/ with collision handling. Checkpoint advances, downstream events flow. Move failures themselves are logged but don't crash the loop. Complementary to yesterday's atomic_write_json fix (state files); this addresses the same race-pattern on event files instead. Regression test asserts: bad event quarantined to failed_events dir, removed from hot path, subsequent good event processed (node online), checkpoint moves to good event.	2026-06-12 11:22:56 +02:00
Oskar Kapala	d7e0d3162f	fix(ha-diag-agent): remove host port mapping for 8087 Port 8087 conflicted with zigbee2mqtt on piha (8087:8080 mapping active for 7+ days), preventing ha-diag-agent from starting. Grep across the full repo confirms no external consumer (no nginx/npm proxy, no Prometheus scrape, no control-plane reference) uses this port. The Docker healthcheck runs inside the container network namespace and does not require a host-side mapping. Internal FastAPI binding on 8087 is unchanged. Removed: ports section from docker-compose.yml and service.yaml. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-06-11 19:46:28 +02:00
Oskar Kapala	ab8895d28b	feat(ha-diag-agent): scaffold service with HA REST client and event emitter - new per-host service, follows node-agent pattern - 7 new HA event types defined (routing in supervisor — Phase 5) - HeartbeatCheck as pipeline validator (pings /api/, emits ha_websocket_dead) - service.yaml + host configs for piha (ken) and chelsty-infra (chelsty) - test scaffolding with aiohttp/aiosqlite mocks (15/15 passing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-05-29 12:26:34 +02:00

Author

SHA1

Message

Date

Oskar Kapala

c255a021d1

fix(observer): quarantine malformed event files to prevent processing wedge

Was: malformed event (bad JSON / truncated / corrupted bytes) wedged the
node's checkpoint forever — every cycle re-tried, logged, never advanced
past the bad file; all subsequent good events for that node lost.

Now: first parse failure -> atomic os.replace to STATE_DIR/observer_failed_events/<node>/
with collision handling. Checkpoint advances, downstream events flow.
Move failures themselves are logged but don't crash the loop.

Complementary to yesterday's atomic_write_json fix (state files);
this addresses the same race-pattern on event files instead.

Regression test asserts: bad event quarantined to failed_events dir,
removed from hot path, subsequent good event processed (node online),
checkpoint moves to good event.

2026-06-12 11:22:56 +02:00

Oskar Kapala

d7e0d3162f

fix(ha-diag-agent): remove host port mapping for 8087

Port 8087 conflicted with zigbee2mqtt on piha (8087:8080 mapping active
for 7+ days), preventing ha-diag-agent from starting.

Grep across the full repo confirms no external consumer (no nginx/npm
proxy, no Prometheus scrape, no control-plane reference) uses this port.
The Docker healthcheck runs inside the container network namespace and
does not require a host-side mapping. Internal FastAPI binding on 8087
is unchanged.

Removed: ports section from docker-compose.yml and service.yaml.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-06-11 19:46:28 +02:00

Oskar Kapala

ab8895d28b

feat(ha-diag-agent): scaffold service with HA REST client and event emitter

- new per-host service, follows node-agent pattern
- 7 new HA event types defined (routing in supervisor — Phase 5)
- HeartbeatCheck as pipeline validator (pings /api/, emits ha_websocket_dead)
- service.yaml + host configs for piha (ken) and chelsty-infra (chelsty)
- test scaffolding with aiohttp/aiosqlite mocks (15/15 passing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-05-29 12:26:34 +02:00

3 commits