homelab-codex-ws/services
Oskar Kapala 5e97b4e448 fix(supervisor): atomic writes + skip cycle on unreadable world state
Two independent fixes for the false-alarm storm caused by race-condition
reads of truncated world state files:

1. Atomic writes: _atomic_write_json (write→fsync→os.replace) replaces
   all bare open('w')+json.dump calls in supervisor and executor, so the
   action-file pipeline is never visible in a half-written state.

2. Resilient loader: _load_actual_state now returns False when any world
   state file fails to parse (empty or truncated mid-write). reconcile()
   skips the entire drift check on False instead of treating {} as "all
   services missing". actual_state retains its last-known-good values so
   a single bad cycle does not wipe accumulated context.

   Before: parse error → raw[key]={} → all desired services missing →
     wall of redeploy actions → drift_resolved_auto churn on next cycle.
   After:  parse error → WARNING logged → cycle skipped → no actions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-03 12:26:59 +02:00
..
agent-system fix(dashboard): read last_update from JSON content, not file mtime 2026-05-31 22:10:50 +02:00
brain-watchdog test(brain-watchdog): add pytest suite covering import and check() logic 2026-06-01 20:38:24 +02:00
control-plane fix(supervisor): atomic writes + skip cycle on unreadable world state 2026-06-03 12:26:59 +02:00
forgejo Add node capability model 2026-05-11 20:46:50 +02:00
ha-diag-agent feat(control-plane): shadow_mode for HA event auto-actions + deploy docs 2026-05-29 17:12:33 +02:00
mosquitto Implement filesystem-first runtime event system 2026-05-12 13:38:25 +02:00
node-agent Fix ghost service keys from hash-prefixed Docker container names 2026-05-27 15:41:13 +02:00
node_exporter Fix pending actions: node_exporter, zigbee2mqtt, chelsty-ha monitoring 2026-05-27 15:10:48 +02:00
npm Add node capability model 2026-05-11 20:46:50 +02:00
ollama Add node capability model 2026-05-11 20:46:50 +02:00
planner-agent fix+debug(planner-agent): use base_url (not api_base) for litellm.acompletion, add print [TEMP] 2026-05-28 13:07:58 +02:00
stability-agent Fix stability agent fleet deploy scripts 2026-05-17 21:09:06 +02:00
zigbee2mqtt docs: compress CLAUDE.md + fix zigbee2mqtt coordinator docs 2026-05-29 14:17:23 +02:00
.gitkeep Add infrastructure standards and deployment conventions 2026-05-07 21:16:03 +02:00