homelab-codex-ws/docs/deployment.md

115 lines
5.2 KiB
Markdown
Raw Normal View History

# Deployment Conventions
This document describes the GitOps-lite deployment process for the homelab.
## Principles
1. **Git as Source of Truth**: All infrastructure definitions (Docker Compose, configurations) are stored in Git.
2. **Unidirectional Flow**: Changes flow from **SATURN** (commit node) to execution nodes.
3. **Lightweight**: No complex orchestrators (no Kubernetes). Use `docker compose` and simple shell scripts.
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
2026-05-11 20:46:50 +02:00
## Staged Deployment Framework
2026-05-11 20:46:50 +02:00
The homelab uses a staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable.
2026-05-11 20:46:50 +02:00
### Deployment Stages
2026-05-11 20:46:50 +02:00
1. **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment.
2. **deploy**: Executes `docker compose` commands for all assigned services.
3. **verify**: Checks the health and connectivity of deployed services.
4. **diagnose**: Performs deep checks and resource analysis if something goes wrong.
5. **rollback**: Reverts to a previous known-good state.
6. **resume**: Automatically continues from the last successful stage.
2026-05-11 20:46:50 +02:00
### State Tracking and Logging
2026-05-11 20:46:50 +02:00
- **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`.
- **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`.
### Operational Semantics
Deployment is **hybrid**:
- **SATURN** acts as the orchestrator and source of truth.
- **Nodes** execute the deployment locally using the `deploy.sh` script.
- Human-in-the-loop is required for triggering and confirming deployments.
### Recovery Workflow
If a deployment fails:
1. Run `deploy.sh diagnose` to identify the issue.
2. Use the `recover-node` AI prompt to analyze logs and get recommendations.
3. Either fix the issue and run `deploy.sh resume`, or use `deploy.sh rollback`.
## Onboarding New Nodes
Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
1. Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
2. Bootstrap the node (Docker, Tailscale, Git).
3. Run the staged deployment framework starting with `prepare`.
## Host-Local Overrides
If a service requires host-specific configuration (e.g., unique device paths for GPUs on SOLARIA):
1. Create a `docker-compose.override.yml` in `/opt/homelab/config/<service>/`.
2. The deployment script should include this override if it exists.
For CHELSTY Home Assistant infrastructure, host-local configuration is the
authority for runtime identity, secrets, and local device endpoints:
- Home Assistant config: `/opt/homelab/config/homeassistant`
- Zigbee2MQTT config: `/opt/homelab/config/zigbee2mqtt`
- Mosquitto config: `/opt/homelab/config/mosquitto`
CHELSTY services must not require SATURN, VPS, or Forgejo to be reachable after
deployment has completed. Docker Compose definitions can still come from Git,
but Home Assistant automation, Zigbee control, and MQTT messaging must continue
locally while LTE or Tailscale connectivity is unavailable.
## Exposure Classes
Service inventory may declare one of these exposure classes:
- `local-only`: bind only to host, LAN, or container networks. This is the default for Zigbee2MQTT and Mosquitto.
- `tailscale-internal`: reachable over Tailscale only. This is appropriate for Home Assistant remote administration.
- `public`: reachable from the public internet through a deliberate ingress path, normally the VPS edge role.
Public exposure is not implied by a service existing in Git. It must be explicit
in host inventory and ingress configuration.
## CHELSTY Home Automation Deployment Notes
CHELSTY remains a Docker Compose execution node. No Kubernetes, Helm, Ansible,
or additional orchestration layer is required for Home Assistant infrastructure.
The SLZB-06U coordinator is network-connected over Ethernet or WiFi. Compose
files and host overrides should configure Zigbee2MQTT for a TCP/network
coordinator endpoint, not a USB serial device. Avoid `/dev/ttyUSB0` mappings.
Runtime paths follow the standard layout:
- `/opt/homelab/data/homeassistant`
- `/opt/homelab/config/homeassistant`
- `/opt/homelab/logs/homeassistant`
- `/opt/homelab/data/zigbee2mqtt`
- `/opt/homelab/config/zigbee2mqtt`
- `/opt/homelab/logs/zigbee2mqtt`
- `/opt/homelab/data/mosquitto`
- `/opt/homelab/config/mosquitto`
- `/opt/homelab/logs/mosquitto`
Recommended backup coverage:
- Home Assistant config and persistent data before upgrades or major integration changes.
- Zigbee2MQTT config, database, coordinator backup files, and Zigbee network key material.
- SLZB-06U firmware version, exported configuration, network address reservation, and coordinator state.
- Mosquitto config, ACL/password files, persistence data, and bridge configuration if enabled.
## Secrets Management
- **Do NOT commit secrets to Git.**
- Secrets should be placed in `/opt/homelab/config/<service>/.env` on the target host.
- The deployment script should ensure these are sourced by Docker Compose.