Was: malformed event (bad JSON / truncated / corrupted bytes) wedged the node's checkpoint forever — every cycle re-tried, logged, never advanced past the bad file; all subsequent good events for that node lost. Now: first parse failure -> atomic os.replace to STATE_DIR/observer_failed_events/<node>/ with collision handling. Checkpoint advances, downstream events flow. Move failures themselves are logged but don't crash the loop. Complementary to yesterday's atomic_write_json fix (state files); this addresses the same race-pattern on event files instead. Regression test asserts: bad event quarantined to failed_events dir, removed from hot path, subsequent good event processed (node online), checkpoint moves to good event. |
||
|---|---|---|
| .. | ||
| runtime | ||
| capabilities.yaml | ||
| host.yaml | ||
| README.md | ||
| services.yaml | ||
PIHA - Infrastructure + Automation Node
Role
- Core network services.
- Home automation (Home Assistant).
- Monitoring and logging.
Configured Services
- Home Assistant
- Mosquitto (MQTT)
- Zigbee2MQTT
Runtime Data
/opt/homelab/data/homeassistant