Find a file
Oskar Kapala 4e8968f9c7 Fix service health tracking: emit service_healthy, control-plane endpoint check, cleanup checkpoint migration
- node_agent: emit service_healthy for all running managed containers so
  observer populates services.json (previously empty → supervisor flooded
  action queue with missing_service redeploys for healthy services)
- node_agent: VPS-only _check_control_plane_health() probes the HTTP
  endpoint to emit service_healthy/unhealthy for the 'control-plane' logical
  service (multi-container stack, container names don't match service name)
- node_agent: fix _cleanup_control_plane_fs() to read new node_checkpoints
  format from observer checkpoint (was reading old last_processed_file key,
  always found nothing, never cleaned up old events)
- observer: handle service_healthy event type → sets service status healthy
  without resolving incidents (unlike service_recovered which also resolves)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 14:49:56 +02:00
backups/zigbee Add Zigbee coordinator backup 2026-05-14 18:24:26 +02:00
docs docs(chelsty): update docs and topology for site/node split 2026-05-20 14:23:57 +02:00
dotfiles add shared zshrc 2026-05-10 20:52:44 +02:00
hosts fix(node-agent): correct VPS_EVENTS_HOST to actual VPS Tailscale IP 2026-05-27 14:07:27 +02:00
inventory ops: align vps desired state with control-plane architecture, remove legacy agent-system references 2026-05-21 11:40:55 +02:00
scripts Fix service health tracking: emit service_healthy, control-plane endpoint check, cleanup checkpoint migration 2026-05-27 14:49:56 +02:00
services Fix service health tracking: emit service_healthy, control-plane endpoint check, cleanup checkpoint migration 2026-05-27 14:49:56 +02:00
.codex Document current homelab state 2026-04-15 17:37:25 +02:00
.gitignore Add infrastructure standards and deployment conventions 2026-05-07 21:16:03 +02:00
CLAUDE.md docs(CLAUDE.md): update node model and override path convention 2026-05-20 15:27:46 +02:00
codex_context Add session context state 2026-04-20 22:10:39 +02:00
codex_context.yaml add shared context lock 2026-05-05 17:25:50 +02:00
deploy_agent.py Add deploy escalation output 2026-04-22 22:08:26 +02:00
ollama_client.py Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
README.md docs: uzupelnij dokumentacje pod katem agentow AI 2026-05-20 12:06:23 +02:00
start-aider.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
start-codex.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
sync-context.sh add shared context lock 2026-05-05 17:25:50 +02:00
tech-debt.md docs: add tech-debt.md, forgejo_runner temp disabled 2026-05-21 10:37:42 +02:00
update-context.md Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00

Homelab Codex

GitOps-lite orchestration for a distributed homelab environment.

Architecture

The homelab consists of several nodes connected via a Tailscale internal mesh.

Host Role Description
SATURN Primary Node Development, orchestration, and git source of truth (commit node).
SOLARIA Compute Node GPU, inference, and heavy compute workloads.
PIHA Infra Node Core infrastructure services, automation, and monitoring.
VPS Edge Node Public ingress, reverse proxy, and edge services.

Repository Structure

Getting Started

  1. Standardization: Follow the Infrastructure Standards.
  2. Deployment: See Deployment Conventions for how to roll out changes.
  3. SATURN: Remember that SATURN is the only node where commits should be made.

Documentation Index


Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.