Two independent fixes for the false-alarm storm caused by race-condition
reads of truncated world state files:
1. Atomic writes: _atomic_write_json (write→fsync→os.replace) replaces
all bare open('w')+json.dump calls in supervisor and executor, so the
action-file pipeline is never visible in a half-written state.
2. Resilient loader: _load_actual_state now returns False when any world
state file fails to parse (empty or truncated mid-write). reconcile()
skips the entire drift check on False instead of treating {} as "all
services missing". actual_state retains its last-known-good values so
a single bad cycle does not wipe accumulated context.
Before: parse error → raw[key]={} → all desired services missing →
wall of redeploy actions → drift_resolved_auto churn on next cycle.
After: parse error → WARNING logged → cycle skipped → no actions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| .. | ||
| agent-system | ||
| brain-watchdog | ||
| control-plane | ||
| forgejo | ||
| ha-diag-agent | ||
| mosquitto | ||
| node-agent | ||
| node_exporter | ||
| npm | ||
| ollama | ||
| planner-agent | ||
| stability-agent | ||
| zigbee2mqtt | ||
| .gitkeep | ||