homelab-codex-ws/docs/vps-control-plane.md

79 lines
2.8 KiB
Markdown

# VPS Control Plane
The VPS Control Plane is the orchestration brain of the homelab platform. It runs on the Hetzner VPS and provides observability, automated reconciliation, and a web-based operator interface.
## Architecture
The control plane consists of four core services running as a Docker Compose stack:
1. **Observer**: Synthesizes world state from events.
2. **Supervisor**: Detects drifts between desired and actual state.
3. **Executor**: Executes approved actions from the queue.
4. **Operator UI**: Web interface for system monitoring and action approval.
All services adhere to **filesystem-first** semantics, using `/opt/homelab/` as the primary data exchange and persistence layer.
## Deployment Flow
### 1. Prerequisites
- Target VPS node must be onboarded (Tailscale active, Docker installed).
- Repository cloned to `/home/oskar/homelab-codex-ws`.
### 2. Bootstrap
Run the bootstrap script to initialize the runtime filesystem and start the stack:
```bash
./scripts/bootstrap/vps-control-plane.sh
```
### 3. Verification
Verify the stack is healthy:
```bash
cd services/control-plane
docker compose ps
curl http://localhost:8080/summary
```
## Operational Workflows
### Action Approval
1. Access the Operator UI (via Tailscale IP or Nginx Proxy Manager).
2. Navigate to **Action Queue**.
3. Review **Pending** actions recommended by the Supervisor.
4. Click **Approve** to move actions to the execution queue.
### Recovery Flow
In case of control plane failure:
1. Check logs: `docker compose logs -f`.
2. Restart stack: `docker compose restart`.
3. Rebuild world state: Delete `/opt/homelab/state/observer_checkpoint.json` and restart the observer service.
### Upgrade Flow
1. Pull latest changes from git.
2. Run bootstrap script again: `./scripts/bootstrap/vps-control-plane.sh`.
- This will rebuild images and restart containers with new code.
### Rollback Semantics
Since the runtime is filesystem-first and append-only:
1. Roll back the repository state to a previous commit.
2. Restart the control plane stack.
3. The supervisor will detect drift against the older (rolled-back) desired state and recommend actions to restore it.
## Runtime Safety
- **Readonly Mounts**: Most services mount the repository as `:ro` to prevent accidental mutations.
- **Least-Privilege**: UI, Observer, and Supervisor run as non-root `homelab` user (UID 1000).
- **Filesystem Isolation**: Clear separation between `/repo` (code/inventory) and `/opt/homelab` (runtime state).
## Integration
### Nginx Proxy Manager
Configure a proxy host in NPM to point to `http://control-plane-ui:8080`. Ensure Websockets are enabled if the UI uses them.
### Log Locations
- Container logs: `docker compose logs`
- Runtime events: `/opt/homelab/events/YYYY-MM-DD/`
- World state: `/opt/homelab/world/`
- Diagnostics: `/opt/homelab/logs/`