Find a file
Oskar Kapala 931fd46e62 fix(onboard): propagate dry-run into steps via run() helper
DRY_RUN now uses 1/0 instead of "true"/"false" across all onboard scripts.

common.sh: add run() — wraps mutations; prints "[dry-run] would: ..." when
  DRY_RUN=1. Exported via `export -f run` so child bash processes inherit it.

onboard.sh: remove the `--dry-run → dryrun "Would execute" → continue` bypass.
  Steps now always execute; DRY_RUN=1 is exported so each step's own run()
  calls handle simulation. The orchestrator no longer needs to know step internals.

remote.sh: update DRY_RUN checks to [ "${DRY_RUN:-0}" = 1 ] for consistency.

00-access.sh: remove all if/else DRY_RUN blocks; replace with:
  - Mutations (ssh-copy-id, curl install, tailscale up) wrapped in run()
  - Probes (SSH BatchMode test, command -v, _ts_state) run unconditionally
    so dry-run reports real current state ("key present → skip" vs "would: ...")
  - Stage 3 verify runs always; SSH failure is die in live mode, warn in
    dry-run (Tailscale not yet joined is expected on a fresh node)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-06-08 15:01:09 +02:00
.claude/skills feat(skills): worktree-aware skill for Claude Code 2026-06-03 17:41:35 +02:00
backups/zigbee Add Zigbee coordinator backup 2026-05-14 18:24:26 +02:00
docs docs: add planner-agent docs and session summary 2026-05-27 2026-05-27 22:35:59 +02:00
dotfiles add shared zshrc 2026-05-10 20:52:44 +02:00
hosts feat(onboard): add 00-access step + update lustro node.yaml 2026-06-08 14:43:16 +02:00
inventory feat(piha): brain-watchdog — external watchdog for control-plane 2026-06-01 17:54:36 +02:00
scripts fix(onboard): propagate dry-run into steps via run() helper 2026-06-08 15:01:09 +02:00
services fix(stability-agent): run as uid 1000 with docker group access 2026-06-03 18:20:54 +02:00
.codex Document current homelab state 2026-04-15 17:37:25 +02:00
.gitignore chore: gitignore *.egg-info, remove committed egg-info 2026-05-29 12:26:57 +02:00
CLAUDE.md docs(claude): multi-agent worktree mode section 2026-06-03 17:41:35 +02:00
codex_context Add session context state 2026-04-20 22:10:39 +02:00
codex_context.yaml add shared context lock 2026-05-05 17:25:50 +02:00
deploy_agent.py Add deploy escalation output 2026-04-22 22:08:26 +02:00
ollama_client.py Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
README.md docs: add planner-agent docs and session summary 2026-05-27 2026-05-27 22:35:59 +02:00
start-aider.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
start-codex.sh Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00
sync-context.sh add shared context lock 2026-05-05 17:25:50 +02:00
tech-debt.md docs: add tech-debt.md, forgejo_runner temp disabled 2026-05-21 10:37:42 +02:00
update-context.md Initial shared homelab agent workspace 2026-05-03 19:37:40 +02:00

Homelab Codex

GitOps-lite orchestration for a distributed homelab environment.

Architecture

The homelab consists of several nodes connected via a Tailscale internal mesh.

Host Role Description
SATURN Primary Node Development, orchestration, and git source of truth (commit node).
SOLARIA Compute Node GPU, inference, and heavy compute workloads.
PIHA Infra Node Core infrastructure services, automation, and monitoring.
VPS Edge Node Public ingress, reverse proxy, and edge services.

Agent System

The homelab uses a multi-agent orchestration model with human-in-the-loop for destructive actions:

Agent Node Role
stability-agent all nodes Per-node watchdog — monitors Docker, disk, Tailscale, MQTT; emits events
node-agent all nodes Publishes container health events to Redis pub/sub
observer VPS Synthesizes world state from events into /opt/homelab/world/*.json
supervisor VPS Detects drift between desired and actual state; writes pending actions
planner-agent SOLARIA LLM-powered diagnosis — listens to Redis, proposes remediation actions
executor VPS Executes actions only after operator approval
operator-ui + telegram-bot VPS / PIHA Operator reviews and approves/rejects pending actions

Action approval flow: pending/ → operator approves → approved/ → executor runs.

Repository Structure

Getting Started

  1. Standardization: Follow the Infrastructure Standards.
  2. Deployment: See Deployment Conventions for how to roll out changes.
  3. SATURN: Remember that SATURN is the only node where commits should be made.

Documentation Index


Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.