DRY_RUN now uses 1/0 instead of "true"/"false" across all onboard scripts.
common.sh: add run() — wraps mutations; prints "[dry-run] would: ..." when
DRY_RUN=1. Exported via `export -f run` so child bash processes inherit it.
onboard.sh: remove the `--dry-run → dryrun "Would execute" → continue` bypass.
Steps now always execute; DRY_RUN=1 is exported so each step's own run()
calls handle simulation. The orchestrator no longer needs to know step internals.
remote.sh: update DRY_RUN checks to [ "${DRY_RUN:-0}" = 1 ] for consistency.
00-access.sh: remove all if/else DRY_RUN blocks; replace with:
- Mutations (ssh-copy-id, curl install, tailscale up) wrapped in run()
- Probes (SSH BatchMode test, command -v, _ts_state) run unconditionally
so dry-run reports real current state ("key present → skip" vs "would: ...")
- Stage 3 verify runs always; SSH failure is die in live mode, warn in
dry-run (Tailscale not yet joined is expected on a fresh node)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| .claude/skills | ||
| backups/zigbee | ||
| docs | ||
| dotfiles | ||
| hosts | ||
| inventory | ||
| scripts | ||
| services | ||
| .codex | ||
| .gitignore | ||
| CLAUDE.md | ||
| codex_context | ||
| codex_context.yaml | ||
| deploy_agent.py | ||
| ollama_client.py | ||
| README.md | ||
| start-aider.sh | ||
| start-codex.sh | ||
| sync-context.sh | ||
| tech-debt.md | ||
| update-context.md | ||
Homelab Codex
GitOps-lite orchestration for a distributed homelab environment.
Architecture
The homelab consists of several nodes connected via a Tailscale internal mesh.
| Host | Role | Description |
|---|---|---|
| SATURN | Primary Node | Development, orchestration, and git source of truth (commit node). |
| SOLARIA | Compute Node | GPU, inference, and heavy compute workloads. |
| PIHA | Infra Node | Core infrastructure services, automation, and monitoring. |
| VPS | Edge Node | Public ingress, reverse proxy, and edge services. |
Agent System
The homelab uses a multi-agent orchestration model with human-in-the-loop for destructive actions:
| Agent | Node | Role |
|---|---|---|
| stability-agent | all nodes | Per-node watchdog — monitors Docker, disk, Tailscale, MQTT; emits events |
| node-agent | all nodes | Publishes container health events to Redis pub/sub |
| observer | VPS | Synthesizes world state from events into /opt/homelab/world/*.json |
| supervisor | VPS | Detects drift between desired and actual state; writes pending actions |
| planner-agent | SOLARIA | LLM-powered diagnosis — listens to Redis, proposes remediation actions |
| executor | VPS | Executes actions only after operator approval |
| operator-ui + telegram-bot | VPS / PIHA | Operator reviews and approves/rejects pending actions |
Action approval flow: pending/ → operator approves → approved/ → executor runs.
Repository Structure
docs/: Infrastructure Standards and Deployment Conventions.hosts/: Host-specific configurations and service assignments.services/: Reusable Docker Compose service definitions.scripts/: Deployment and management scripts.
Getting Started
- Standardization: Follow the Infrastructure Standards.
- Deployment: See Deployment Conventions for how to roll out changes.
- SATURN: Remember that SATURN is the only node where commits should be made.
Documentation Index
- Infrastructure Standards
- Agent Operating Procedures (For AI/Non-Human Agents)
- Deployment Conventions
- Hardware
- Networking
- Services
- Node Capabilities
- Action Model
Note: This repository documents the state of the homelab. Runtime state lives outside the repository in /opt/homelab.