homelab-codex-ws/docs/deployment.md

63 lines
2.8 KiB
Markdown

# Deployment Conventions
This document describes the GitOps-lite deployment process for the homelab.
## Principles
1. **Git as Source of Truth**: All infrastructure definitions (Docker Compose, configurations) are stored in Git.
2. **Unidirectional Flow**: Changes flow from **SATURN** (commit node) to execution nodes.
3. **Lightweight**: No complex orchestrators (no Kubernetes). Use `docker compose` and simple shell scripts.
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
## Staged Deployment Framework
The homelab uses a staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable.
### Deployment Stages
1. **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment.
2. **deploy**: Executes `docker compose` commands for all assigned services.
3. **verify**: Checks the health and connectivity of deployed services.
4. **diagnose**: Performs deep checks and resource analysis if something goes wrong.
5. **rollback**: Reverts to a previous known-good state.
6. **resume**: Automatically continues from the last successful stage.
### State Tracking and Logging
- **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`.
- **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`.
### Operational Semantics
Deployment is **hybrid**:
- **SATURN** acts as the orchestrator and source of truth.
- **Nodes** execute the deployment locally using the `deploy.sh` script.
- Human-in-the-loop is required for triggering and confirming deployments.
### Recovery Workflow
If a deployment fails:
1. Run `deploy.sh diagnose` to identify the issue.
2. Use the `recover-node` AI prompt to analyze logs and get recommendations.
3. Either fix the issue and run `deploy.sh resume`, or use `deploy.sh rollback`.
## Onboarding New Nodes
Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
1. Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
2. Bootstrap the node (Docker, Tailscale, Git).
3. Run the staged deployment framework starting with `prepare`.
## Host-Local Overrides
If a service requires host-specific configuration (e.g., unique device paths for GPUs on SOLARIA):
1. Create a `docker-compose.override.yml` in `/opt/homelab/config/<service>/`.
2. The deployment script should include this override if it exists.
## Secrets Management
- **Do NOT commit secrets to Git.**
- Secrets should be placed in `/opt/homelab/config/<service>/.env` on the target host.
- The deployment script should ensure these are sourced by Docker Compose.