Previously _cancel_resolved_pending_actions() only cancelled actions where
the service became healthy. This left orphaned actions when a service was
removed from services.yaml or marked monitor:false.
Add Case 1: if the action's svc_key is no longer in desired_state (either
removed entirely or skipped due to monitor:false), cancel with reason
service_removed_from_desired_state.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When a service becomes healthy (node-agent emits service_healthy → observer
updates services.json), any previously queued redeploy/container_restart
action is stale. Without cleanup, the queue accumulates old actions that
require manual rejection.
_cancel_resolved_pending_actions() runs after each reconcile cycle:
- Reads all pending/*.json with type=redeploy or container_restart
- If the service is now healthy in actual_state, moves action to cancelled/
with reason=drift_resolved_auto
- Only pending actions are touched; approved/running are left to the operator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- observer: store trigger_type on incidents for supervisor routing
- supervisor: route containers_not_running/mqtt_unreachable to container_restart instead of redeploy
- supervisor: fix node alias normalization via NODE_ALIAS_MAP
- supervisor: fix pending action dedup (scan by content not filename)
- executor: implement container_restart via SSH docker restart with retry
- control-plane override: configure NODE_ALIAS_MAP for production
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Timestamp in reconcile-{ts}-{node}-{service} meant dedup guard never fired.
Switch to reconcile-{node}-{service} and check pending/approved/running states.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>