homelab-codex-ws/docs/vps-control-plane.md

2.8 KiB

VPS Control Plane

The VPS Control Plane is the orchestration brain of the homelab platform. It runs on the Hetzner VPS and provides observability, automated reconciliation, and a web-based operator interface.

Architecture

The control plane consists of four core services running as a Docker Compose stack:

  1. Observer: Synthesizes world state from events.
  2. Supervisor: Detects drifts between desired and actual state.
  3. Executor: Executes approved actions from the queue.
  4. Operator UI: Web interface for system monitoring and action approval.

All services adhere to filesystem-first semantics, using /opt/homelab/ as the primary data exchange and persistence layer.

Deployment Flow

1. Prerequisites

  • Target VPS node must be onboarded (Tailscale active, Docker installed).
  • Repository cloned to /home/oskar/homelab-codex-ws.

2. Bootstrap

Run the bootstrap script to initialize the runtime filesystem and start the stack:

./scripts/bootstrap/vps-control-plane.sh

3. Verification

Verify the stack is healthy:

cd services/control-plane
docker compose ps
curl http://localhost:8080/summary

Operational Workflows

Action Approval

  1. Access the Operator UI (via Tailscale IP or Nginx Proxy Manager).
  2. Navigate to Action Queue.
  3. Review Pending actions recommended by the Supervisor.
  4. Click Approve to move actions to the execution queue.

Recovery Flow

In case of control plane failure:

  1. Check logs: docker compose logs -f.
  2. Restart stack: docker compose restart.
  3. Rebuild world state: Delete /opt/homelab/state/observer_checkpoint.json and restart the observer service.

Upgrade Flow

  1. Pull latest changes from git.
  2. Run bootstrap script again: ./scripts/bootstrap/vps-control-plane.sh.
    • This will rebuild images and restart containers with new code.

Rollback Semantics

Since the runtime is filesystem-first and append-only:

  1. Roll back the repository state to a previous commit.
  2. Restart the control plane stack.
  3. The supervisor will detect drift against the older (rolled-back) desired state and recommend actions to restore it.

Runtime Safety

  • Readonly Mounts: Most services mount the repository as :ro to prevent accidental mutations.
  • Least-Privilege: UI, Observer, and Supervisor run as non-root homelab user (UID 1000).
  • Filesystem Isolation: Clear separation between /repo (code/inventory) and /opt/homelab (runtime state).

Integration

Nginx Proxy Manager

Configure a proxy host in NPM to point to http://control-plane-ui:8080. Ensure Websockets are enabled if the UI uses them.

Log Locations

  • Container logs: docker compose logs
  • Runtime events: /opt/homelab/events/YYYY-MM-DD/
  • World state: /opt/homelab/world/
  • Diagnostics: /opt/homelab/logs/