homelab-codex-ws/docs/vps-control-plane.md

84 lines
2.9 KiB
Markdown

# VPS Control Plane
The VPS Control Plane is the orchestration brain of the homelab platform. It runs on the Hetzner VPS and provides observability, automated reconciliation, and a web-based operator interface.
## Architecture
The control plane consists of four core services running as a Docker Compose stack:
1. **Observer**: Synthesizes world state from events.
2. **Supervisor**: Detects drifts between desired and actual state.
3. **Executor**: Executes approved actions from the queue.
4. **Operator UI**: Web interface for system monitoring and action approval.
All services adhere to **filesystem-first** semantics, using `/opt/homelab/` as the primary data exchange and persistence layer.
## Deployment Flow
### 1. Prerequisites
- Target VPS node must be onboarded (Tailscale active, Docker installed).
- Repository cloned to `/home/oskar/homelab-codex-ws`.
### 2. Bootstrap
Run the local deployment script on the VPS to initialize the runtime filesystem and start the stack:
```bash
cd services/control-plane
bash deploy-local.sh
```
### 3. Verification
Verify the stack is healthy using the deployment script or check container status on the VPS:
```bash
# Check status via deploy script
./scripts/deploy/deploy-control-plane.sh --ssh
# Manual status check on VPS
docker ps --filter "name=control-plane"
```
## Operational Workflows
### Action Approval
1. Access the Operator UI (via Tailscale IP or Nginx Proxy Manager).
2. Navigate to **Action Queue**.
3. Review **Pending** actions recommended by the Supervisor.
4. Click **Approve** to move actions to the execution queue.
### Recovery Flow
In case of control plane failure:
1. Check logs using `docker logs`.
2. Restart stack using the local deployment script: `bash deploy-local.sh`.
3. Rebuild world state: Delete `/opt/homelab/state/observer_checkpoint.json` and redeploy.
### Upgrade Flow
To deploy updates from the SOLARIA/control host:
```bash
./scripts/deploy/deploy-control-plane.sh --ssh
```
### Rollback Semantics
Since the runtime is filesystem-first and append-only:
1. Roll back the repository state to a previous commit.
2. Restart the control plane stack.
3. The supervisor will detect drift against the older (rolled-back) desired state and recommend actions to restore it.
## Runtime Safety
- **Readonly Mounts**: Most services mount the repository as `:ro` to prevent accidental mutations.
- **Least-Privilege**: UI, Observer, and Supervisor run as non-root `homelab` user (UID 1000).
- **Filesystem Isolation**: Clear separation between `/repo` (code/inventory) and `/opt/homelab` (runtime state).
## Integration
### Nginx Proxy Manager
Configure a proxy host in NPM to point to `http://control-plane-ui:8080`. Ensure Websockets are enabled if the UI uses them.
### Log Locations
- Container logs: `docker compose logs`
- Runtime events: `/opt/homelab/events/YYYY-MM-DD/`
- World state: `/opt/homelab/world/`
- Diagnostics: `/opt/homelab/logs/`