140 lines
7.4 KiB
Markdown
140 lines
7.4 KiB
Markdown
# Deployment Conventions
|
|
|
|
This document describes the GitOps-lite deployment process for the homelab.
|
|
|
|
## Principles
|
|
|
|
1. **Git as Source of Truth**: All infrastructure definitions (Docker Compose, configurations) are stored in Git.
|
|
2. **Unidirectional Flow**: Changes flow from **SATURN** (commit node) to execution nodes.
|
|
3. **Lightweight**: No complex orchestrators (no Kubernetes). Use `docker compose` and simple shell scripts.
|
|
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
|
|
5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
|
|
|
|
## Staged Deployment Framework
|
|
|
|
The homelab uses a modularized staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable, with core logic split into maintainable libraries in `scripts/lib/`.
|
|
|
|
### Runtime Architecture
|
|
|
|
The runtime consists of:
|
|
- `deploy.sh`: Orchestration entrypoint.
|
|
- `lib/log.sh`: Logging and structured output.
|
|
- `lib/state.sh`: Deployment state tracking and stage persistence.
|
|
- `lib/inventory.sh`: Reliable host and service discovery (Python-based YAML parsing).
|
|
- `lib/compose.sh`: Docker Compose operations.
|
|
- `lib/diagnostics.sh`: Post-failure analysis and summary generation.
|
|
|
|
### Deployment Stages
|
|
|
|
1. **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment. It is tolerant of network failures to support intermittently connected nodes like CHELSTY.
|
|
2. **validate**: Ensures all required service definitions and metadata are present.
|
|
3. **deploy**: Executes `docker compose` commands for all assigned services. Supports `.env` files and `docker-compose.override.yml` under `/opt/homelab/config/<service>/`.
|
|
4. **verify**: Executes service-specific `healthcheck.sh` scripts or checks container status.
|
|
5. **diagnose**: Automatically triggered on failure; collects container status and logs for troubleshooting.
|
|
6. **complete**: Finalizes the deployment and marks the state as finished.
|
|
|
|
### State Tracking and Logging
|
|
|
|
- **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`. The last successfully processed service in the `deploy` stage is tracked in `last_service` to support granular resumption.
|
|
- **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`. Structured log entries prefixed with `[STRUCT]` provide machine-parseable event data.
|
|
|
|
### Resume Semantics
|
|
|
|
If a deployment is interrupted (e.g., due to LTE disconnect on CHELSTY):
|
|
1. Rerun the script with the `--resume` flag: `scripts/deploy/deploy.sh --resume`.
|
|
2. The script identifies the last incomplete stage using deterministic markers (`/opt/homelab/state/deploy/stage_<name>_complete`) and continues from the exact failure point.
|
|
3. In the `deploy` stage, it specifically resumes from the first service that was not successfully completed, skipping those already up.
|
|
4. Repeated runs are safe and idempotent; completed stages are not re-executed unless the resume flag is omitted (which clears state for a fresh run).
|
|
|
|
### Diagnostics and Troubleshooting
|
|
|
|
The runtime is designed to fail predictably and provide immediate feedback:
|
|
- **Automatic Diagnostics**: If any stage fails, `collect_diagnostics` is triggered to capture system state and container logs into `/opt/homelab/logs/deploy/diagnostics_<timestamp>.txt`.
|
|
- **Deployment Summary**: Every run concludes with a concise summary showing the host status, last stage reached, and log locations.
|
|
- **Offline Resilience**: The `prepare` stage handles `git pull` failures gracefully, allowing deployment from local cache during network instability.
|
|
|
|
### Operational Semantics
|
|
|
|
Deployment is **hybrid**:
|
|
- **SATURN** acts as the orchestrator and source of truth.
|
|
- **Nodes** execute the deployment locally using the `deploy.sh` script.
|
|
- Human-in-the-loop is required for triggering and confirming deployments.
|
|
|
|
### Recovery Workflow
|
|
|
|
If a deployment fails:
|
|
1. Run `deploy.sh --stage diagnose` to identify the issue.
|
|
2. Use the `recover-node` AI prompt to analyze logs and get recommendations.
|
|
3. Fix the issue (e.g., update a secret in `.env`) and run `deploy.sh --resume`.
|
|
|
|
## Onboarding New Nodes
|
|
|
|
Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
|
|
1. Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
|
|
2. Bootstrap the node (Docker, Tailscale, Git).
|
|
3. Run the staged deployment framework starting with `prepare`.
|
|
|
|
## Host-Local Overrides
|
|
|
|
If a service requires host-specific configuration (e.g., unique device paths for GPUs on SOLARIA):
|
|
|
|
1. Create a `docker-compose.override.yml` in `/opt/homelab/config/<service>/`.
|
|
2. The deployment script should include this override if it exists.
|
|
|
|
For CHELSTY Home Assistant infrastructure, host-local configuration is the
|
|
authority for runtime identity, secrets, and local device endpoints:
|
|
|
|
- Home Assistant config: `/opt/homelab/config/homeassistant`
|
|
- Zigbee2MQTT config: `/opt/homelab/config/zigbee2mqtt`
|
|
- Mosquitto config: `/opt/homelab/config/mosquitto`
|
|
|
|
CHELSTY services must not require SATURN, VPS, or Forgejo to be reachable after
|
|
deployment has completed. Docker Compose definitions can still come from Git,
|
|
but Home Assistant automation, Zigbee control, and MQTT messaging must continue
|
|
locally while LTE or Tailscale connectivity is unavailable.
|
|
|
|
## Exposure Classes
|
|
|
|
Service inventory may declare one of these exposure classes:
|
|
|
|
- `local-only`: bind only to host, LAN, or container networks. This is the default for Zigbee2MQTT and Mosquitto.
|
|
- `tailscale-internal`: reachable over Tailscale only. This is appropriate for Home Assistant remote administration.
|
|
- `public`: reachable from the public internet through a deliberate ingress path, normally the VPS edge role.
|
|
|
|
Public exposure is not implied by a service existing in Git. It must be explicit
|
|
in host inventory and ingress configuration.
|
|
|
|
## CHELSTY Home Automation Deployment Notes
|
|
|
|
CHELSTY remains a Docker Compose execution node. No Kubernetes, Helm, Ansible,
|
|
or additional orchestration layer is required for Home Assistant infrastructure.
|
|
|
|
The SLZB-06U coordinator is network-connected over Ethernet or WiFi. Compose
|
|
files and host overrides should configure Zigbee2MQTT for a TCP/network
|
|
coordinator endpoint, not a USB serial device. Avoid `/dev/ttyUSB0` mappings.
|
|
|
|
Runtime paths follow the standard layout:
|
|
|
|
- `/opt/homelab/data/homeassistant`
|
|
- `/opt/homelab/config/homeassistant`
|
|
- `/opt/homelab/logs/homeassistant`
|
|
- `/opt/homelab/data/zigbee2mqtt`
|
|
- `/opt/homelab/config/zigbee2mqtt`
|
|
- `/opt/homelab/logs/zigbee2mqtt`
|
|
- `/opt/homelab/data/mosquitto`
|
|
- `/opt/homelab/config/mosquitto`
|
|
- `/opt/homelab/logs/mosquitto`
|
|
|
|
Recommended backup coverage:
|
|
|
|
- Home Assistant config and persistent data before upgrades or major integration changes.
|
|
- Zigbee2MQTT config, database, coordinator backup files, and Zigbee network key material.
|
|
- SLZB-06U firmware version, exported configuration, network address reservation, and coordinator state.
|
|
- Mosquitto config, ACL/password files, persistence data, and bridge configuration if enabled.
|
|
|
|
## Secrets Management
|
|
|
|
- **Do NOT commit secrets to Git.**
|
|
- Secrets should be placed in `/opt/homelab/config/<service>/.env` on the target host.
|
|
- The deployment script should ensure these are sourced by Docker Compose.
|