Find a file

Oskar Kapala 931fd46e62 fix(onboard): propagate dry-run into steps via run() helper DRY_RUN now uses 1/0 instead of "true"/"false" across all onboard scripts. common.sh: add run() — wraps mutations; prints "[dry-run] would: ..." when DRY_RUN=1. Exported via `export -f run` so child bash processes inherit it. onboard.sh: remove the `--dry-run → dryrun "Would execute" → continue` bypass. Steps now always execute; DRY_RUN=1 is exported so each step's own run() calls handle simulation. The orchestrator no longer needs to know step internals. remote.sh: update DRY_RUN checks to [ "${DRY_RUN:-0}" = 1 ] for consistency. 00-access.sh: remove all if/else DRY_RUN blocks; replace with: - Mutations (ssh-copy-id, curl install, tailscale up) wrapped in run() - Probes (SSH BatchMode test, command -v, _ts_state) run unconditionally so dry-run reports real current state ("key present → skip" vs "would: ...") - Stage 3 verify runs always; SSH failure is die in live mode, warn in dry-run (Tailscale not yet joined is expected on a fresh node) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>		2026-06-08 15:01:09 +02:00
.claude/skills	feat(skills): worktree-aware skill for Claude Code	2026-06-03 17:41:35 +02:00
backups/zigbee	Add Zigbee coordinator backup	2026-05-14 18:24:26 +02:00
docs	docs: add planner-agent docs and session summary 2026-05-27	2026-05-27 22:35:59 +02:00
dotfiles	add shared zshrc	2026-05-10 20:52:44 +02:00
hosts	feat(onboard): add 00-access step + update lustro node.yaml	2026-06-08 14:43:16 +02:00
inventory	feat(piha): brain-watchdog — external watchdog for control-plane	2026-06-01 17:54:36 +02:00
scripts	fix(onboard): propagate dry-run into steps via run() helper	2026-06-08 15:01:09 +02:00
services	fix(stability-agent): run as uid 1000 with docker group access	2026-06-03 18:20:54 +02:00
.codex	Document current homelab state	2026-04-15 17:37:25 +02:00
.gitignore	chore: gitignore *.egg-info, remove committed egg-info	2026-05-29 12:26:57 +02:00
CLAUDE.md	docs(claude): multi-agent worktree mode section	2026-06-03 17:41:35 +02:00
codex_context	Add session context state	2026-04-20 22:10:39 +02:00
codex_context.yaml	add shared context lock	2026-05-05 17:25:50 +02:00
deploy_agent.py	Add deploy escalation output	2026-04-22 22:08:26 +02:00
ollama_client.py	Initial shared homelab agent workspace	2026-05-03 19:37:40 +02:00
README.md	docs: add planner-agent docs and session summary 2026-05-27	2026-05-27 22:35:59 +02:00
start-aider.sh	Initial shared homelab agent workspace	2026-05-03 19:37:40 +02:00
start-codex.sh	Initial shared homelab agent workspace	2026-05-03 19:37:40 +02:00
sync-context.sh	add shared context lock	2026-05-05 17:25:50 +02:00
tech-debt.md	docs: add tech-debt.md, forgejo_runner temp disabled	2026-05-21 10:37:42 +02:00
update-context.md	Initial shared homelab agent workspace	2026-05-03 19:37:40 +02:00

README.md

Homelab Codex

GitOps-lite orchestration for a distributed homelab environment.

Architecture

The homelab consists of several nodes connected via a Tailscale internal mesh.

Host	Role	Description
SATURN	Primary Node	Development, orchestration, and git source of truth (commit node).
SOLARIA	Compute Node	GPU, inference, and heavy compute workloads.
PIHA	Infra Node	Core infrastructure services, automation, and monitoring.
VPS	Edge Node	Public ingress, reverse proxy, and edge services.

Agent System

The homelab uses a multi-agent orchestration model with human-in-the-loop for destructive actions:

Agent	Node	Role
stability-agent	all nodes	Per-node watchdog — monitors Docker, disk, Tailscale, MQTT; emits events
node-agent	all nodes	Publishes container health events to Redis pub/sub
observer	VPS	Synthesizes world state from events into `/opt/homelab/world/*.json`
supervisor	VPS	Detects drift between desired and actual state; writes `pending` actions
planner-agent	SOLARIA	LLM-powered diagnosis — listens to Redis, proposes remediation actions
executor	VPS	Executes actions only after operator approval
operator-ui + telegram-bot	VPS / PIHA	Operator reviews and approves/rejects pending actions

Action approval flow: pending/ → operator approves → approved/ → executor runs.

Repository Structure

docs/: Infrastructure Standards and Deployment Conventions.
hosts/: Host-specific configurations and service assignments.
services/: Reusable Docker Compose service definitions.
scripts/: Deployment and management scripts.

Getting Started

Standardization: Follow the Infrastructure Standards.
Deployment: See Deployment Conventions for how to roll out changes.
SATURN: Remember that SATURN is the only node where commits should be made.

Documentation Index

Infrastructure Standards
Agent Operating Procedures (For AI/Non-Human Agents)
Deployment Conventions
Hardware
Networking
Services
Node Capabilities
Action Model

Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.