homelab-codex-ws/docs/deployment.md

6.1 KiB

Deployment Conventions

This document describes the GitOps-lite deployment process for the homelab.

Principles

  1. Git as Source of Truth: All infrastructure definitions (Docker Compose, configurations) are stored in Git.
  2. Unidirectional Flow: Changes flow from SATURN (commit node) to execution nodes.
  3. Lightweight: No complex orchestrators (no Kubernetes). Use docker compose and simple shell scripts.
  4. Tailscale Mesh: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
  5. Host Autonomy: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.

Staged Deployment Framework

The homelab uses a staged deployment framework located at scripts/deploy/deploy.sh. This script is designed to be resumable, stage-aware, and observable.

Deployment Stages

  1. prepare: Pulls the latest changes from Git, validates inventory, and prepares the local environment. It is tolerant of network failures to support intermittently connected nodes like CHELSTY.
  2. validate: Ensures all required service definitions and metadata are present.
  3. deploy: Executes docker compose commands for all assigned services. Supports .env files and docker-compose.override.yml under /opt/homelab/config/<service>/.
  4. verify: Executes service-specific healthcheck.sh scripts or checks container status.
  5. diagnose: Automatically triggered on failure; collects container status and logs for troubleshooting.
  6. complete: Finalizes the deployment and marks the state as finished.

State Tracking and Logging

  • State: Local node state is tracked in /opt/homelab/state/deploy/current_stage. The last successfully processed service in the deploy stage is tracked in last_service to support granular resumption.
  • Logs: Detailed execution logs are stored in /opt/homelab/logs/deploy/deploy_<timestamp>.log. Structured log entries prefixed with [STRUCT] provide machine-parseable event data.

Resume Semantics

If a deployment is interrupted (e.g., due to LTE disconnect on CHELSTY):

  1. Rerun the script with the --resume flag: scripts/deploy/deploy.sh --resume.
  2. The script reads the last incomplete stage and continues from there.
  3. In the deploy stage, it specifically resumes from the first service that was not successfully completed.

Operational Semantics

Deployment is hybrid:

  • SATURN acts as the orchestrator and source of truth.
  • Nodes execute the deployment locally using the deploy.sh script.
  • Human-in-the-loop is required for triggering and confirming deployments.

Recovery Workflow

If a deployment fails:

  1. Run deploy.sh --stage diagnose to identify the issue.
  2. Use the recover-node AI prompt to analyze logs and get recommendations.
  3. Fix the issue (e.g., update a secret in .env) and run deploy.sh --resume.

Onboarding New Nodes

Refer to inventory/templates/how_to_add_new_node.yaml for a detailed guide on adding new hardware to the mesh. The general flow is:

  1. Define node in hosts/ and inventory/topology.yaml on SATURN.
  2. Bootstrap the node (Docker, Tailscale, Git).
  3. Run the staged deployment framework starting with prepare.

Host-Local Overrides

If a service requires host-specific configuration (e.g., unique device paths for GPUs on SOLARIA):

  1. Create a docker-compose.override.yml in /opt/homelab/config/<service>/.
  2. The deployment script should include this override if it exists.

For CHELSTY Home Assistant infrastructure, host-local configuration is the authority for runtime identity, secrets, and local device endpoints:

  • Home Assistant config: /opt/homelab/config/homeassistant
  • Zigbee2MQTT config: /opt/homelab/config/zigbee2mqtt
  • Mosquitto config: /opt/homelab/config/mosquitto

CHELSTY services must not require SATURN, VPS, or Forgejo to be reachable after deployment has completed. Docker Compose definitions can still come from Git, but Home Assistant automation, Zigbee control, and MQTT messaging must continue locally while LTE or Tailscale connectivity is unavailable.

Exposure Classes

Service inventory may declare one of these exposure classes:

  • local-only: bind only to host, LAN, or container networks. This is the default for Zigbee2MQTT and Mosquitto.
  • tailscale-internal: reachable over Tailscale only. This is appropriate for Home Assistant remote administration.
  • public: reachable from the public internet through a deliberate ingress path, normally the VPS edge role.

Public exposure is not implied by a service existing in Git. It must be explicit in host inventory and ingress configuration.

CHELSTY Home Automation Deployment Notes

CHELSTY remains a Docker Compose execution node. No Kubernetes, Helm, Ansible, or additional orchestration layer is required for Home Assistant infrastructure.

The SLZB-06U coordinator is network-connected over Ethernet or WiFi. Compose files and host overrides should configure Zigbee2MQTT for a TCP/network coordinator endpoint, not a USB serial device. Avoid /dev/ttyUSB0 mappings.

Runtime paths follow the standard layout:

  • /opt/homelab/data/homeassistant
  • /opt/homelab/config/homeassistant
  • /opt/homelab/logs/homeassistant
  • /opt/homelab/data/zigbee2mqtt
  • /opt/homelab/config/zigbee2mqtt
  • /opt/homelab/logs/zigbee2mqtt
  • /opt/homelab/data/mosquitto
  • /opt/homelab/config/mosquitto
  • /opt/homelab/logs/mosquitto

Recommended backup coverage:

  • Home Assistant config and persistent data before upgrades or major integration changes.
  • Zigbee2MQTT config, database, coordinator backup files, and Zigbee network key material.
  • SLZB-06U firmware version, exported configuration, network address reservation, and coordinator state.
  • Mosquitto config, ACL/password files, persistence data, and bridge configuration if enabled.

Secrets Management

  • Do NOT commit secrets to Git.
  • Secrets should be placed in /opt/homelab/config/<service>/.env on the target host.
  • The deployment script should ensure these are sourced by Docker Compose.