Find a file
Oskar Kapala fb7828b52b supervisor: auto-cancel pending actions when drift is resolved
When a service becomes healthy (node-agent emits service_healthy → observer
updates services.json), any previously queued redeploy/container_restart
action is stale. Without cleanup, the queue accumulates old actions that
require manual rejection.

_cancel_resolved_pending_actions() runs after each reconcile cycle:
- Reads all pending/*.json with type=redeploy or container_restart
- If the service is now healthy in actual_state, moves action to cancelled/
  with reason=drift_resolved_auto
- Only pending actions are touched; approved/running are left to the operator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 14:58:55 +02:00
backups/zigbee Add Zigbee coordinator backup 2026-05-14 18:24:26 +02:00
docs docs(chelsty): update docs and topology for site/node split 2026-05-20 14:23:57 +02:00
dotfiles add shared zshrc 2026-05-10 20:52:44 +02:00
hosts vps/node-agent: add network_mode: host for control-plane health probe 2026-05-27 14:52:32 +02:00
inventory ops: align vps desired state with control-plane architecture, remove legacy agent-system references 2026-05-21 11:40:55 +02:00
scripts Fix service health tracking: emit service_healthy, control-plane endpoint check, cleanup checkpoint migration 2026-05-27 14:49:56 +02:00
services supervisor: auto-cancel pending actions when drift is resolved 2026-05-27 14:58:55 +02:00
.codex Document current homelab state 2026-04-15 17:37:25 +02:00
.gitignore Add infrastructure standards and deployment conventions 2026-05-07 21:16:03 +02:00
CLAUDE.md docs(CLAUDE.md): update node model and override path convention 2026-05-20 15:27:46 +02:00
codex_context Add session context state 2026-04-20 22:10:39 +02:00
codex_context.yaml add shared context lock 2026-05-05 17:25:50 +02:00
deploy_agent.py Add deploy escalation output 2026-04-22 22:08:26 +02:00
ollama_client.py Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
README.md docs: uzupelnij dokumentacje pod katem agentow AI 2026-05-20 12:06:23 +02:00
start-aider.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
start-codex.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
sync-context.sh add shared context lock 2026-05-05 17:25:50 +02:00
tech-debt.md docs: add tech-debt.md, forgejo_runner temp disabled 2026-05-21 10:37:42 +02:00
update-context.md Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00

Homelab Codex

GitOps-lite orchestration for a distributed homelab environment.

Architecture

The homelab consists of several nodes connected via a Tailscale internal mesh.

Host Role Description
SATURN Primary Node Development, orchestration, and git source of truth (commit node).
SOLARIA Compute Node GPU, inference, and heavy compute workloads.
PIHA Infra Node Core infrastructure services, automation, and monitoring.
VPS Edge Node Public ingress, reverse proxy, and edge services.

Repository Structure

Getting Started

  1. Standardization: Follow the Infrastructure Standards.
  2. Deployment: See Deployment Conventions for how to roll out changes.
  3. SATURN: Remember that SATURN is the only node where commits should be made.

Documentation Index


Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.