observer: service_healthy resolves active incidents
service_healthy is a positive health confirmation — if the service had an active incident (e.g. from earlier service_unhealthy events), that incident should be resolved when the service is confirmed healthy. Previously only service_recovered resolved incidents; service_healthy set status=healthy but left incidents open, keeping status='degraded'. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
parent
46ae92b5c1
commit
28e9534765
|
|
@ -268,8 +268,11 @@ class Observer:
|
|||
# Positive confirmation from node-agent that a managed container
|
||||
# is running. This keeps services.json populated so the supervisor
|
||||
# can correctly detect drift (absent entry = never reported = unknown,
|
||||
# not the same as confirmed missing). No incident resolution needed.
|
||||
# not the same as confirmed missing).
|
||||
# Also resolve any active incident — if a service that had been
|
||||
# unhealthy/crashing is now confirmed healthy, the incident is over.
|
||||
self.world_state["services"][svc_key]["status"] = "healthy"
|
||||
self._resolve_incident(svc_key, timestamp)
|
||||
elif etype in ["service_unhealthy", "healthcheck_failed"]:
|
||||
self.world_state["services"][svc_key]["status"] = "unhealthy"
|
||||
self._handle_incident(svc_key, event)
|
||||
|
|
|
|||
Loading…
Reference in a new issue