2026-05-11 20:56:47 +02:00
47 changed files with 1373 additions and 20 deletions
--- a/docs/capabilities.md
+++ b/docs/capabilities.md
@ -0,0 +1,85 @@
 # Node Capability Model
 This document defines the capability model for the homelab infrastructure. The goal is to provide a declarative way to describe what each node can do, its constraints, and its suitability for various workloads.
 ## Overview
 Capabilities are defined per host in `hosts/<hostname>/capabilities.yaml`. This metadata allows infrastructure tooling and future AI agents to reason about workload placement, recovery, and compatibility without hardcoding logic into the orchestration system.
 ## Schema Definition
 The `capabilities.yaml` file follows this structure:
 ```yaml
 capabilities:
  hardware:
    cpu:
      arch: <string>          # e.g., x86_64, arm64
      cores: <int>
      threads: <int>
    memory:
      total_gb: <int>
    acceleration:
      type: <string>          # e.g., none, cuda, tpu, vaapi
      model: <string>         # e.g., "NVIDIA RTX 3060", "Coral Edge TPU"
  virtualization:
    supported: <boolean>
    type: <string>            # e.g., kvm, docker-only
  storage:
    persistence: <string>     # ephemeral, persistent, redundant
    type: <string>            # ssd, hdd, nvme, sd-card
    capacity_gb: <int>
  networking:
    reachability: <string>    # public, tailscale-only, lan-only
    ingress_suitability: <boolean>
    bandwidth: <string>       # e.g., "1Gbps", "100Mbps", "LTE"
  runtime:
    container_engine: <string> # docker, podman, containerd
    os: <string>              # debian, ubuntu, alpine, nixos
  operational:
    power_constraint: <string> # low-power, mains, battery-backed
    connectivity: <string>     # stable, intermittent
    availability_target: <string> # high, medium, best-effort
  deployment:
    suitability: [<string>]    # list of workload types (e.g., ai, database, edge, web)
    restricted: <boolean>      # if true, only specific workloads are allowed
 ```
 ## Placement Reasoning Examples
 ### AI Workloads
 A service requiring `cuda` acceleration will be matched against nodes where `capabilities.hardware.acceleration.type == "cuda"`.
 *   **Target:** `solaria`
 ### Public Ingress
 A service requiring public exposure will look for `capabilities.networking.ingress_suitability == true`.
 *   **Target:** `vps`
 ### Low-Power Staging
 Staging workloads that should not consume significant power or are tolerant of intermittent connectivity.
 *   **Target:** `chelsty`
 ## Recovery Reasoning Examples
 ### Failover Strategy
 If `saturn` (the primary orchestrator) fails:
 1.  Identify nodes with `roles: [control]` or `roles: [infra]`.
 2.  Check `capabilities.operational.availability_target == "high"`.
 3.  Propose migration of critical infra services to `piha`.
 ### Storage-Bound Services
 If a node with `persistence: persistent` fails, the agent must check if there are other nodes with `persistence: persistent` and compatible `storage.type` before attempting recovery, or warn about potential data loss if moved to an `ephemeral` node.
 ## Future Usage by AI Agents
 Future autonomous agents will use this metadata to:
 1.  **Evaluate Suitability:** Match service requirements (from `service.yaml`) against node capabilities.
 2.  **Generate Plans:** Create step-by-step deployment or migration plans based on hardware compatibility.
 3.  **Validate Topology:** Ensure that a proposed multi-node setup doesn't violate networking or operational constraints (e.g., don't put a DB on an intermittent node).
 4.  **Propose Failover:** Automatically suggest the best alternative node during an outage.
--- a/docs/deployment.md
+++ b/docs/deployment.md
@ -8,23 +8,46 @@ This document describes the GitOps-lite deployment process for the homelab.
 2.  **Unidirectional Flow**: Changes flow from **SATURN** (commit node) to execution nodes.
 3.  **Lightweight**: No complex orchestrators (no Kubernetes). Use `docker compose` and simple shell scripts.
 4.  **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
 5.  **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
-## Deployment Process
+## Staged Deployment Framework
-### 1. Preparation (on SATURN)
+The homelab uses a staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable.
-   Modify or create service definitions in `services/`.
+### Deployment Stages
 -   Assign services to hosts by creating/updating `hosts/<hostname>/services.txt` (or similar mapping).
 -   Commit and push changes to the Forgejo instance.
-### 2. Deployment (on Execution Node)
+1.  **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment.
 2.  **deploy**: Executes `docker compose` commands for all assigned services.
 3.  **verify**: Checks the health and connectivity of deployed services.
 4.  **diagnose**: Performs deep checks and resource analysis if something goes wrong.
 5.  **rollback**: Reverts to a previous known-good state.
 6.  **resume**: Automatically continues from the last successful stage.
-Execution nodes run a deployment script (e.g., via cron or manual trigger) that:
+### State Tracking and Logging
-1.  Performs a `git pull` from the source of truth.
+-   **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`.
-2.  Identifies services assigned to this host.
+-   **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`.
-3.  Symlinks or copies `services/<service>/docker-compose.yml` to `/opt/homelab/services/`.
+
-4.  Runs `docker compose up -d --remove-orphans`.
+### Operational Semantics
 Deployment is **hybrid**:
 -   **SATURN** acts as the orchestrator and source of truth.
 -   **Nodes** execute the deployment locally using the `deploy.sh` script.
 -   Human-in-the-loop is required for triggering and confirming deployments.
 ### Recovery Workflow
 If a deployment fails:
 1.  Run `deploy.sh diagnose` to identify the issue.
 2.  Use the `recover-node` AI prompt to analyze logs and get recommendations.
 3.  Either fix the issue and run `deploy.sh resume`, or use `deploy.sh rollback`.
 ## Onboarding New Nodes
 Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
 1.  Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
 2.  Bootstrap the node (Docker, Tailscale, Git).
 3.  Run the staged deployment framework starting with `prepare`.
 ## Host-Local Overrides
@ -33,6 +56,57 @@ If a service requires host-specific configuration (e.g., unique device paths for
 1.  Create a `docker-compose.override.yml` in `/opt/homelab/config/<service>/`.
 2.  The deployment script should include this override if it exists.
 For CHELSTY Home Assistant infrastructure, host-local configuration is the
 authority for runtime identity, secrets, and local device endpoints:
 - Home Assistant config: `/opt/homelab/config/homeassistant`
 - Zigbee2MQTT config: `/opt/homelab/config/zigbee2mqtt`
 - Mosquitto config: `/opt/homelab/config/mosquitto`
 CHELSTY services must not require SATURN, VPS, or Forgejo to be reachable after
 deployment has completed. Docker Compose definitions can still come from Git,
 but Home Assistant automation, Zigbee control, and MQTT messaging must continue
 locally while LTE or Tailscale connectivity is unavailable.
 ## Exposure Classes
 Service inventory may declare one of these exposure classes:
 - `local-only`: bind only to host, LAN, or container networks. This is the default for Zigbee2MQTT and Mosquitto.
 - `tailscale-internal`: reachable over Tailscale only. This is appropriate for Home Assistant remote administration.
 - `public`: reachable from the public internet through a deliberate ingress path, normally the VPS edge role.
 Public exposure is not implied by a service existing in Git. It must be explicit
 in host inventory and ingress configuration.
 ## CHELSTY Home Automation Deployment Notes
 CHELSTY remains a Docker Compose execution node. No Kubernetes, Helm, Ansible,
 or additional orchestration layer is required for Home Assistant infrastructure.
 The SLZB-06U coordinator is network-connected over Ethernet or WiFi. Compose
 files and host overrides should configure Zigbee2MQTT for a TCP/network
 coordinator endpoint, not a USB serial device. Avoid `/dev/ttyUSB0` mappings.
 Runtime paths follow the standard layout:
 - `/opt/homelab/data/homeassistant`
 - `/opt/homelab/config/homeassistant`
 - `/opt/homelab/logs/homeassistant`
 - `/opt/homelab/data/zigbee2mqtt`
 - `/opt/homelab/config/zigbee2mqtt`
 - `/opt/homelab/logs/zigbee2mqtt`
 - `/opt/homelab/data/mosquitto`
 - `/opt/homelab/config/mosquitto`
 - `/opt/homelab/logs/mosquitto`
 Recommended backup coverage:
 - Home Assistant config and persistent data before upgrades or major integration changes.
 - Zigbee2MQTT config, database, coordinator backup files, and Zigbee network key material.
 - SLZB-06U firmware version, exported configuration, network address reservation, and coordinator state.
 - Mosquitto config, ACL/password files, persistence data, and bridge configuration if enabled.
 ## Secrets Management
 -   **Do NOT commit secrets to Git.**
--- a/docs/lifecycle.md
+++ b/docs/lifecycle.md
@ -0,0 +1,51 @@
 # Service Lifecycle and Recovery
 This document defines the lifecycle of a service in the homelab and the procedures for operational recovery.
 ## Service Lifecycle
 1.  **Onboarding**: 
    - Create `services/<service>/` directory.
    - Define `docker-compose.yml`, `service.yaml`, `README.md`, `env.example`, and `healthcheck.sh`.
    - Register service in `inventory/topology.yaml` or relevant host configs.
 2.  **Provisioning**:
    - Ensure `/opt/homelab/data/<service>` exists.
    - Ensure `/opt/homelab/config/<service>` exists and contains required secrets/configs.
    - Setup environment variables from `env.example` into `/opt/homelab/config/<service>/.env`.
 3.  **Deployment**:
    - `docker compose pull`
    - `docker compose up -d`
 4.  **Verification**:
    - Run `healthcheck.sh`.
    - Verify ports are reachable according to `service.yaml`.
 5.  **Maintenance**:
    - Periodic updates via `docker compose pull`.
    - Log monitoring via `docker compose logs -f`.
 6.  **Decommissioning**:
    - `docker compose down`.
    - Archive `/opt/homelab/data/<service>` if necessary.
 ## Operational Recovery
 ### 1. Container Failure
 If a service is unhealthy:
 - Check `docker compose logs`.
 - Restart: `docker compose restart`.
 - Recreate: `docker compose up -d --force-recreate`.
 ### 2. Node Failure
 If a host node fails:
 - Services with `owner_node` matching the failed node must be recovered on a backup node or the node must be restored.
 - Persistence data must be restored from backups to `/opt/homelab/data/<service>`.
 ### 3. Dependency Recovery
 If a dependency fails:
 - Services depending on it might report unhealthy status.
 - Recover the dependency first.
 - Re-verify dependent services.
 ## Persistent Data Conventions
 - **Data**: `/opt/homelab/data/<service>` - Primary persistent state.
 - **Config**: `/opt/homelab/config/<service>` - Local overrides and secrets.
 - **Backups**: Standard backup routines should target `/opt/homelab/data/`.
--- a/docs/service-model.md
+++ b/docs/service-model.md
@ -0,0 +1,75 @@
 # Service Model and Healthchecks
 This document defines the normalized service model for the homelab.
 ## Service Layout
 Each service must reside in its own directory under `services/`:
 ```text
 services/<service>/
 ├── docker-compose.yml   # Docker Compose definition
 ├── service.yaml         # Service metadata and orchestration contract
 ├── README.md            # Service documentation
 ├── env.example          # Template for required environment variables
 └── healthcheck.sh       # Standardized healthcheck script
 ```
 ## Service Metadata (`service.yaml`)
 The `service.yaml` file provides a machine-readable contract for deployment and orchestration.
 ### Schema
 ```yaml
 service:
  name: <string>               # Canonical service name (kebab-case)
  owner_node: <string>         # Preferred host node
  exposure: <class>            # public, private, or local-only
  dependencies: [<service>]    # List of required services
  ports:
    - container: <int>
      host: <int>
      protocol: <tcp|udp>
  healthcheck:
    type: <string>             # local-only, container, http, mqtt
    endpoint: <string>         # URL or topic if applicable
    interval: <duration>
    timeout: <duration>
    retries: <int>
  restart_policy: <string>     # unless-stopped, always, etc.
  persistence:
    paths:
      - /opt/homelab/data/<service>/...
  runtime:
    directories: [<string>]    # Required host directories to be created
    env_vars: [<string>]       # List of required environment variables (keys only)
 ```
 ## Healthcheck Semantics
 The `healthcheck.sh` script should return `0` for healthy and `1` for unhealthy. It should support different modes based on `service.yaml` definitions.
 ### 1. Local-only
 Checks if the container is running and the process is alive within the host.
 ### 2. Container-level
 Uses `docker inspect` or `docker exec` to check internal container health.
 ### 3. HTTP
 Performs a `curl` against a specific endpoint (e.g., `/health` or `/`).
 ### 4. MQTT
 Verifies that a specific topic is being updated or responds to a ping.
 ### 5. Dependency-aware
 The healthcheck script may optionally check if its dependencies are healthy before reporting its own status.
 ## Runtime Authority
 `/opt/homelab/config/<service>` is the source of truth for:
 - Secrets (not in Git)
 - Host-local overrides
 - Mutable configuration
 Services should mount files from this directory as needed.
--- a/docs/standards.md
+++ b/docs/standards.md
@ -19,11 +19,14 @@ This document defines the standards and conventions for the homelab GitOps-lite
 /
 ├── docs/               # Infrastructure documentation
 ├── hosts/              # Host-specific configurations
-│   ├── saturn/
+├── inventory/          # Topology and templates
-│   ├── solaria/
+├── services/           # Normalized service definitions
-│   ├── piha/
+│   └── <service>/
-│   └── vps/
+│       ├── docker-compose.yml
-├── services/           # Reusable service definitions (Docker Compose)
+│       ├── service.yaml
 │       ├── README.md
 │       ├── env.example
 │       └── healthcheck.sh
 ├── scripts/            # Management and deployment scripts
 └── README.md
 ```
@ -37,18 +40,28 @@ Runtime state must live outside the repository to keep it immutable and clean.
 ├── services/           # Active docker-compose files (deployed from git)
 ├── data/               # Persistent volume data (backed up)
 ├── config/             # Host-local overrides and secrets (not in git)
 │   └── <service>/
 │       ├── .env        # Merged environment variables
 │       └── overrides/  # Local configuration overrides
 └── logs/               # Service logs
 ```
 ## Service Standards
 1.  **Normalization**: Every service MUST follow the `services/<service>/` layout.
 2.  **Metadata**: Every service MUST have a `service.yaml` defining its operational contract.
 3.  **Healthchecks**: Every service MUST have a `healthcheck.sh` for verification.
 4.  **Secrets**: NEVER commit secrets to Git. Use `env.example` as a template and populate `/opt/homelab/config/<service>/.env` on the host.
 ## Docker Compose Standards
 1.  **File Naming**: Use `docker-compose.yml`.
-2.  **Container Naming**: `service-name`.
+2.  **Container Naming**: Match the service name.
-3.  **Restarts**: Always use `restart: unless-stopped`.
+3.  **Restarts**: Always use `restart: unless-stopped` unless specified otherwise in `service.yaml`.
 4.  **Networking**:
    -   Use `tailscale` internal mesh for inter-host communication.
    -   Expose ports only when necessary.
-5.  **Volumes**: Use named volumes or absolute paths to `/opt/homelab/data/service-name`.
+5.  **Volumes**: Use absolute paths to `/opt/homelab/data/<service>`.
 ## Environment Variables
--- a/docs/topology.md
+++ b/docs/topology.md
@ -8,7 +8,7 @@
 | PIHA | Infrastructure and monitoring node |
 | SOLARIA | AI and compute node |
 | VPS | Public ingress and edge node |
-| CHELSTY | Virtualization and Home Assistant node |
+| CHELSTY | LTE-connected edge hypervisor and Home Assistant node |
 ## Architecture Principles
@ -21,6 +21,36 @@
 - Deployment uses lightweight shell scripts.
 - Avoid Kubernetes and heavy orchestration frameworks.
 ## CHELSTY Home Automation
 CHELSTY hosts the local home automation control plane. Because it uses an LTE
 uplink and may be intermittently connected, Home Assistant, Zigbee2MQTT, and
 Mosquitto must continue operating without SATURN, VPS, or Forgejo.
 The CHELSTY Home Assistant inventory is split across:
 - `hosts/chelsty/services.yaml`
 - `hosts/chelsty/networking.yaml`
 - `hosts/chelsty/paths.yaml`
 Service exposure is classified as:
 - `local-only`: available only on local host, LAN, or container networks.
 - `tailscale-internal`: available to approved Tailscale clients only.
 - `public`: available from the public internet through explicit ingress.
 Initial CHELSTY service intent:
 | Service | Role | Exposure | Offline required |
 |---|---|---|---|
 | homeassistant | Home automation controller | tailscale-internal | yes |
 | zigbee2mqtt | Zigbee to MQTT bridge | local-only | yes |
 | mosquitto | Local MQTT broker | local-only | yes |
 The Zigbee coordinator is an SLZB-06U network coordinator. It should be modeled
 as an Ethernet/WiFi network device consumed by Zigbee2MQTT, not as a USB dongle.
 Do not use `/dev/ttyUSB0` or other USB device mappings for this coordinator.
 ## Runtime Layout
 Runtime data should live under:
@ -32,3 +62,12 @@ with separated:
 - data
 - config
 - logs
 CHELSTY follows the same layout:
 - `/opt/homelab/data/<service>` for persistent service data.
 - `/opt/homelab/config/<service>` for host-local configuration and secrets.
 - `/opt/homelab/logs/<service>` for logs that should stay outside Git.
 Critical backup sets on CHELSTY include Home Assistant config, Zigbee2MQTT
 config and network state, Mosquitto config/data, and SLZB-06U coordinator state.
--- a/hosts/chelsty/capabilities.yaml
+++ b/hosts/chelsty/capabilities.yaml
@ -0,0 +1,40 @@
 capabilities:
  hardware:
    cpu:
      arch: x86_64
      cores: 4
      threads: 4
    memory:
      total_gb: 16
    acceleration:
      type: none
  virtualization:
    supported: true
    type: kvm
  storage:
    persistence: persistent
    type: ssd
    capacity_gb: 250
  networking:
    reachability: tailscale-only
    ingress_suitability: false
    bandwidth: LTE
  runtime:
    container_engine: docker
    os: debian
  operational:
    power_constraint: low-power
    connectivity: intermittent
    availability_target: best-effort
  deployment:
    suitability:
      - staging
      - homeassistant
      - edge
    restricted: false
--- a/hosts/chelsty/networking.yaml
+++ b/hosts/chelsty/networking.yaml
@ -0,0 +1,57 @@
 host: chelsty
 uplink:
  type: lte
  connectivity: intermittent
  public_reachability: not-assumed
 tailscale:
  enabled: true
  host_ip: 100.122.201.22
  role: internal-management
 exposure_classes:
  local-only:
    description: LAN, host, or container-network access only.
  tailscale-internal:
    description: Tailnet access only; no public ingress dependency.
  public:
    description: Public internet exposure through an explicitly defined ingress host.
 networks:
  home_automation_lan:
    purpose: Home Assistant, MQTT, Zigbee coordinator, and local device control.
    offline_required: true
    internet_required_for_core_operation: false
 devices:
  slzb-06u:
    role: zigbee-coordinator
    vendor_model: SLZB-06U
    connection_type: network
    transport:
      primary: ethernet
      secondary: wifi
      usb: false
    address:
      hostname: slzb-06u.local
      ipv4: null
      port: 6638
      protocol: tcp
    consumers:
      - zigbee2mqtt
    placement: chelsty-home-automation-lan
    operational_notes:
      - Treat the coordinator as a network appliance, not a USB dongle.
      - Do not configure /dev/ttyUSB0 or other host USB device mappings for this coordinator.
      - Prefer static DHCP or a reserved IP once the LAN addressing plan is known.
    backup:
      recommended: true
      include:
        - coordinator firmware version
        - coordinator configuration export
        - Zigbee network backup from Zigbee2MQTT
        - device IEEE address and network parameters
      notes:
        - Keep a copy of coordinator state with the Zigbee2MQTT backup set.
        - Record the reserved IP or DNS name used by Zigbee2MQTT.
--- a/hosts/chelsty/paths.yaml
+++ b/hosts/chelsty/paths.yaml
@ -0,0 +1,48 @@
 host: chelsty
 runtime_root: /opt/homelab
 conventions:
  services: /opt/homelab/services
  data: /opt/homelab/data
  config: /opt/homelab/config
  logs: /opt/homelab/logs
 services:
  homeassistant:
    data: /opt/homelab/data/homeassistant
    config: /opt/homelab/config/homeassistant
    logs: /opt/homelab/logs/homeassistant
    backup_priority: critical
  zigbee2mqtt:
    data: /opt/homelab/data/zigbee2mqtt
    config: /opt/homelab/config/zigbee2mqtt
    logs: /opt/homelab/logs/zigbee2mqtt
    backup_priority: critical
  mosquitto:
    data: /opt/homelab/data/mosquitto
    config: /opt/homelab/config/mosquitto
    logs: /opt/homelab/logs/mosquitto
    backup_priority: high
 backup_sets:
  homeassistant:
    include:
      - /opt/homelab/config/homeassistant
      - /opt/homelab/data/homeassistant
    restore_note: Restore before starting the Home Assistant container.
  zigbee2mqtt:
    include:
      - /opt/homelab/config/zigbee2mqtt
      - /opt/homelab/data/zigbee2mqtt
    restore_note: Restore before starting Zigbee2MQTT so coordinator and network state remain aligned.
  slzb-06u:
    include:
      - SLZB-06U firmware version
      - SLZB-06U exported configuration
      - Zigbee network backup generated by Zigbee2MQTT
    restore_note: Restore or reconfigure coordinator state before permitting Zigbee2MQTT to reform the network.
--- a/hosts/chelsty/services.yaml
+++ b/hosts/chelsty/services.yaml
@ -0,0 +1,108 @@
 host: chelsty
 exposure_classes:
  local-only:
    description: Reachable only from CHELSTY-local networks or container networks.
    public_ingress: false
    tailscale_required: false
  tailscale-internal:
    description: Reachable through the Tailscale mesh by approved tailnet clients.
    public_ingress: false
    tailscale_required: true
  public:
    description: Reachable from the public internet through an explicit ingress path.
    public_ingress: true
    tailscale_required: false
 operational_constraints:
  uplink: lte
  connectivity: intermittent
  offline_operation_required: true
  must_not_depend_on:
    - saturn
    - vps
    - forgejo
 services:
  homeassistant:
    role: home-automation-controller
    deployment_model: docker-compose
    exposure: tailscale-internal
    offline_required: true
    depends_on:
      local:
        - mosquitto
        - zigbee2mqtt
      external: []
    ports:
      - name: http
        container_port: 8123
        protocol: tcp
    runtime:
      config_path: /opt/homelab/config/homeassistant
      data_path: /opt/homelab/data/homeassistant
      logs_path: /opt/homelab/logs/homeassistant
    backup:
      recommended: true
      include:
        - /opt/homelab/config/homeassistant
        - /opt/homelab/data/homeassistant
      notes:
        - Back up before Home Assistant core, supervisor-equivalent, or integration upgrades.
        - Keep local restore copies on CHELSTY because LTE connectivity may be unavailable during recovery.
  zigbee2mqtt:
    role: zigbee-mqtt-bridge
    deployment_model: docker-compose
    exposure: local-only
    offline_required: true
    depends_on:
      local:
        - mosquitto
      external:
        - slzb-06u
    coordinator:
      name: slzb-06u
      connection: network
      usb_device: null
    ports:
      - name: frontend
        container_port: 8080
        protocol: tcp
        exposure: tailscale-internal
    runtime:
      config_path: /opt/homelab/config/zigbee2mqtt
      data_path: /opt/homelab/data/zigbee2mqtt
      logs_path: /opt/homelab/logs/zigbee2mqtt
    backup:
      recommended: true
      include:
        - /opt/homelab/config/zigbee2mqtt
        - /opt/homelab/data/zigbee2mqtt
      notes:
        - Include configuration.yaml, database.db, coordinator backup files, and network key material.
        - Restore Zigbee2MQTT state together with the SLZB-06U coordinator state when replacing hardware.
  mosquitto:
    role: local-mqtt-broker
    deployment_model: docker-compose
    exposure: local-only
    offline_required: true
    depends_on:
      local: []
      external: []
    ports:
      - name: mqtt
        container_port: 1883
        protocol: tcp
    runtime:
      config_path: /opt/homelab/config/mosquitto
      data_path: /opt/homelab/data/mosquitto
      logs_path: /opt/homelab/logs/mosquitto
    backup:
      recommended: true
      include:
        - /opt/homelab/config/mosquitto
        - /opt/homelab/data/mosquitto
      notes:
        - Retain ACL, password, persistence, and bridge configuration if enabled.
--- a/hosts/piha/capabilities.yaml
+++ b/hosts/piha/capabilities.yaml
@ -0,0 +1,39 @@
 capabilities:
  hardware:
    cpu:
      arch: arm64
      cores: 4
      threads: 4
    memory:
      total_gb: 4
    acceleration:
      type: none
  virtualization:
    supported: false
    type: docker-only
  storage:
    persistence: persistent
    type: sd-card
    capacity_gb: 32
  networking:
    reachability: tailscale-only
    ingress_suitability: false
    bandwidth: 1Gbps
  runtime:
    container_engine: docker
    os: debian
  operational:
    power_constraint: mains
    connectivity: stable
    availability_target: medium
  deployment:
    suitability:
      - infra
      - monitoring
    restricted: false
--- a/hosts/saturn/capabilities.yaml
+++ b/hosts/saturn/capabilities.yaml
@ -0,0 +1,40 @@
 capabilities:
  hardware:
    cpu:
      arch: arm64
      cores: 8
      threads: 8
    memory:
      total_gb: 8
    acceleration:
      type: none
  virtualization:
    supported: false
    type: docker-only
  storage:
    persistence: persistent
    type: sd-card
    capacity_gb: 64
  networking:
    reachability: tailscale-only
    ingress_suitability: false
    bandwidth: 1Gbps
  runtime:
    container_engine: docker
    os: debian
  operational:
    power_constraint: mains
    connectivity: stable
    availability_target: high
  deployment:
    suitability:
      - control
      - development
      - infra
    restricted: false
--- a/hosts/solaria/capabilities.yaml
+++ b/hosts/solaria/capabilities.yaml
@ -0,0 +1,41 @@
 capabilities:
  hardware:
    cpu:
      arch: x86_64
      cores: 12
      threads: 24
    memory:
      total_gb: 64
    acceleration:
      type: cuda
      model: "NVIDIA RTX 4070"
  virtualization:
    supported: true
    type: kvm
  storage:
    persistence: redundant
    type: nvme
    capacity_gb: 2000
  networking:
    reachability: tailscale-only
    ingress_suitability: false
    bandwidth: 1Gbps
  runtime:
    container_engine: docker
    os: ubuntu
  operational:
    power_constraint: mains
    connectivity: stable
    availability_target: medium
  deployment:
    suitability:
      - ai
      - compute
      - database
    restricted: false
--- a/hosts/vps/capabilities.yaml
+++ b/hosts/vps/capabilities.yaml
@ -0,0 +1,40 @@
 capabilities:
  hardware:
    cpu:
      arch: x86_64
      cores: 2
      threads: 2
    memory:
      total_gb: 4
    acceleration:
      type: none
  virtualization:
    supported: false
    type: docker-only
  storage:
    persistence: persistent
    type: ssd
    capacity_gb: 80
  networking:
    reachability: public
    ingress_suitability: true
    bandwidth: 1Gbps
  runtime:
    container_engine: docker
    os: debian
  operational:
    power_constraint: mains
    connectivity: stable
    availability_target: high
  deployment:
    suitability:
      - edge
      - ingress
      - web
    restricted: true
--- a/inventory/templates/how_to_add_new_node.yaml
+++ b/inventory/templates/how_to_add_new_node.yaml
@ -0,0 +1,29 @@
 ---
 title: How to Add a New Node to the Homelab
 description: This guide outlines the process for onboarding a new execution node into the GitOps-lite environment.
 phases:
  - phase: 1. Preparation (on SATURN)
    steps:
      - "Define Node Inventory: Create hosts/<hostname>/ directory"
      - "Add host.yaml with hardware metadata"
      - "Add networking.yaml with IP and Tailscale info"
      - "Add capabilities.yaml with node capability description"
      - "Add services.txt listing assigned services"
      - "Update inventory/topology.yaml"
      - "Commit and push changes to Forgejo"
  - phase: 2. Bootstrapping (on the New Node)
    steps:
      - "Install OS (Debian/Ubuntu recommended)"
      - "Configure SSH and user access"
      - "Install Docker, Docker Compose, Tailscale, Git"
      - "Join the tailnet"
      - "Clone repository: git clone <forgejo-url>/homelab-codex.git ~/homelab-codex-ws"
      - "Setup runtime: sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
  - phase: 3. Initial Deployment
    steps:
      - "Run prepare: ~/homelab-codex-ws/scripts/deploy/deploy.sh prepare"
      - "Run deploy: ~/homelab-codex-ws/scripts/deploy/deploy.sh deploy"
      - "Run verify: ~/homelab-codex-ws/scripts/deploy/deploy.sh verify"
--- a/inventory/templates/node-bootstrap-checklist.yaml
+++ b/inventory/templates/node-bootstrap-checklist.yaml
@ -0,0 +1,29 @@
 ---
 bootstrap_checklist:
  pre_flight:
    - task: "Hardware connected and powered"
      done: false
    - task: "Base OS installed (Debian/Ubuntu)"
      done: false
    - task: "Network connectivity established"
      done: false
    - task: "SSH access configured"
      done: false
  onboarding:
    - task: "Tailscale installed and authenticated"
      done: false
    - task: "Docker and Compose V2 installed"
      done: false
    - task: "Git installed"
      done: false
    - task: "Repository cloned to ~/homelab-codex-ws"
      done: false
    - task: "Opt homelab structure created"
      done: false
  initial_run:
    - task: "deploy.sh prepare successful"
      done: false
    - task: "deploy.sh deploy successful"
      done: false
    - task: "deploy.sh verify successful"
      done: false
--- a/inventory/templates/node-discovery-commands.yaml
+++ b/inventory/templates/node-discovery-commands.yaml
@ -0,0 +1,18 @@
 ---
 discovery_commands:
  cpu:
    - "lscpu"
    - "cat /proc/cpuinfo"
  memory:
    - "free -h"
  storage:
    - "lsblk"
    - "df -h"
  network:
    - "ip addr"
    - "tailscale status"
  gpu:
    - "nvidia-smi"
    - "lspci | grep -i vga"
  usb:
    - "lsusb"
--- a/inventory/templates/prepare-node.yaml
+++ b/inventory/templates/prepare-node.yaml
@ -0,0 +1,13 @@
 ---
 node_preparation:
  actions:
    - name: update_system
      command: "sudo apt update && sudo apt upgrade -y"
    - name: install_dependencies
      command: "sudo apt install -y curl git docker.io docker-compose-v2 tailscale"
    - name: configure_docker_permissions
      command: "sudo usermod -aG docker $USER"
    - name: create_runtime_directories
      command: "sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
    - name: initialize_repo
      command: "git clone <repo_url> ~/homelab-codex-ws"
--- a/inventory/templates/prompts/create-node
+++ b/inventory/templates/prompts/create-node
@ -0,0 +1,13 @@
 ### System Prompt Addendum: Create Node
 **Context**: You are assisting in adding a new node to the homelab.
 **Task**: Generate the necessary inventory files for a new node.
 **Requirements**:
 1. Ask for: hostname, IP address, Tailscale IP, hardware specs (CPU/RAM/Storage), and intended role/services.
 2. Generate `hosts/<hostname>/host.yaml` and `hosts/<hostname>/networking.yaml`.
 3. Provide a snippet for `inventory/topology.yaml`.
 4. Recommend services based on hardware (e.g., if GPU is present, suggest inference services).
 **Output Format**: YAML blocks for each file.
 **Restriction**: Do NOT execute any shell commands. Only provide the configuration.
--- a/inventory/templates/prompts/deploy-node
+++ b/inventory/templates/prompts/deploy-node
@ -0,0 +1,16 @@
 ### System Prompt Addendum: Deploy Node
 **Context**: Orchestrating a deployment across one or more nodes.
 **Task**: Generate the deployment plan and verification checklist.
 **Requirements**:
 1. Identify which nodes need updates based on git changes.
 2. Recommend the sequence of stages (e.g., `prepare` on all, then `deploy` on edge nodes first).
 3. Generate a human-readable checklist for the operator.
 4. Define verification criteria for the `verify` stage.
 **Output Format**:
 - Deployment Plan (sequence of commands).
 - Verification Checklist.
 **Restriction**: Do NOT mutate infrastructure autonomously.
--- a/inventory/templates/prompts/recover-node
+++ b/inventory/templates/prompts/recover-node
@ -0,0 +1,17 @@
 ### System Prompt Addendum: Recover Node
 **Context**: A homelab node is unresponsive or has suffered data loss.
 **Task**: Analyze logs and state to recommend recovery steps.
 **Requirements**:
 1. Request the content of `/opt/homelab/logs/deploy/` (latest log) and `/opt/homelab/state/deploy/current_stage`.
 2. Analyze the last failed stage.
 3. Recommend specific `deploy.sh` commands (e.g., `rollback` or `resume`).
 4. Provide manual recovery steps if automated stages fail.
 **Output Format**: 
 - Analysis of the failure.
 - Recommended action.
 - Documentation of the recovery process.
 **Restriction**: Do NOT auto-execute deployment.
--- a/inventory/topology.yaml
+++ b/inventory/topology.yaml
@ -30,6 +30,20 @@ nodes:
  chelsty:
    roles:
      - remote
      - hypervisor
      - homeassistant
      - staging
    connectivity:
      uplink: lte
      intermittent: true
    home_automation:
      offline_operation_required: true
      services:
        - homeassistant
        - zigbee2mqtt
        - mosquitto
      coordinator:
        model: SLZB-06U
        connection: network
        usb: false
--- a/scripts/deploy/deploy.sh
+++ b/scripts/deploy/deploy.sh
@ -0,0 +1,110 @@
 #!/usr/bin/env bash
 # deploy.sh - Staged deployment framework for homelab nodes.
 # Usage: ./deploy.sh [stage]
 set -e
 # --- Configuration ---
 RUNTIME_PATH="/opt/homelab"
 STATE_DIR="${RUNTIME_PATH}/state/deploy"
 LOG_DIR="${RUNTIME_PATH}/logs/deploy"
 REPO_PATH="${HOME}/homelab-codex-ws"
 TIMESTAMP=$(date +%Y%m%d_%H%M%S)
 LOG_FILE="${LOG_DIR}/deploy_${TIMESTAMP}.log"
 # --- Initialization ---
 mkdir -p "$STATE_DIR" "$LOG_DIR"
 # Redirection for logging
 exec > >(tee -a "$LOG_FILE") 2>&1
 log() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
 }
 set_state() {
    echo "$1" > "${STATE_DIR}/current_stage"
    log "State set to: $1"
 }
 get_state() {
    if [ -f "${STATE_DIR}/current_stage" ]; then
        cat "${STATE_DIR}/current_stage"
    else
        echo "none"
    fi
 }
 # --- Stages ---
 stage_prepare() {
    log "Stage: PREPARE"
    set_state "prepare"
    # Skeleton: Pull latest changes, check dependencies, validate inventory
    log "Checking repository at $REPO_PATH..."
    cd "$REPO_PATH" && git pull
    log "Preparation complete."
 }
 stage_deploy() {
    log "Stage: DEPLOY"
    set_state "deploy"
    # Skeleton: Iterate through services and run docker compose
    log "Deploying services defined for $(hostname)..."
    # Implementation detail: loop through services/ and run compose
    log "Deployment complete."
 }
 stage_verify() {
    log "Stage: VERIFY"
    set_state "verify"
    # Skeleton: Check container status, healthchecks, connectivity
    log "Verifying service health..."
    docker ps
    log "Verification complete."
 }
 stage_diagnose() {
    log "Stage: DIAGNOSE"
    # Skeleton: Check logs, resource usage, networking
    log "Running diagnostics..."
    docker stats --no-stream
    log "Diagnostics complete."
 }
 stage_rollback() {
    log "Stage: ROLLBACK"
    # Skeleton: Revert to previous git commit or previous state
    log "Rolling back changes..."
    log "Rollback complete."
 }
 stage_resume() {
    log "Stage: RESUME"
    CURRENT=$(get_state)
    log "Resuming from state: $CURRENT"
    case "$CURRENT" in
        "prepare") stage_deploy ;;
        "deploy") stage_verify ;;
        "verify") log "Last deployment was verified. Nothing to resume." ;;
        *) log "Unknown state or nothing to resume. Starting from prepare..."; stage_prepare ;;
    esac
 }
 # --- Main ---
 COMMAND=${1:-resume}
 log "--- Homelab Deployment Started (Command: $COMMAND) ---"
 case "$COMMAND" in
    prepare)  stage_prepare ;;
    deploy)   stage_deploy ;;
    verify)   stage_verify ;;
    diagnose) stage_diagnose ;;
    rollback) stage_rollback ;;
    resume)   stage_resume ;;
    *)        echo "Usage: $0 {prepare|deploy|verify|diagnose|rollback|resume}"; exit 1 ;;
 esac
 log "--- Homelab Deployment Finished ---"
--- a/services/forgejo/README.md
+++ b/services/forgejo/README.md
@ -0,0 +1,9 @@
 # Forgejo
 Forgejo is a self-hosted lightweight software forge. Easy to install and low maintenance.
 ## Usage
 Deployed on the `saturn` node as the git source of truth.
 Web UI is available on port 3000.
 SSH for git is available on port 222.
--- a/services/forgejo/docker-compose.yml
+++ b/services/forgejo/docker-compose.yml
@ -0,0 +1,15 @@
 services:
  forgejo:
    image: codeberg.org/forgejo/forgejo:latest
    container_name: forgejo
    restart: unless-stopped
    environment:
      - USER_UID=1000
      - USER_GID=1000
    volumes:
      - /opt/homelab/data/forgejo/data:/data
      - /etc/timezone:/etc/timezone:ro
      - /etc/localtime:/etc/localtime:ro
    ports:
      - '3000:3000'
      - '222:22'
--- a/services/forgejo/env.example
+++ b/services/forgejo/env.example
@ -0,0 +1,3 @@
 USER_UID=1000
 USER_GID=1000
 # FORGEJO__database__DB_TYPE=sqlite3
--- a/services/forgejo/healthcheck.sh
+++ b/services/forgejo/healthcheck.sh
@ -0,0 +1,17 @@
 #!/bin/bash
 # Healthcheck for Forgejo
 # Check if the container is running
 if ! docker ps --filter "name=forgejo" --filter "status=running" | grep -q "forgejo"; then
    echo "[FAIL] Forgejo container is not running"
    exit 1
 fi
 # Check API health endpoint
 if ! curl -sf http://localhost:3000/api/healthz > /dev/null; then
    echo "[FAIL] Forgejo API is not responding"
    exit 1
 fi
 echo "[OK] Forgejo is healthy"
 exit 0
--- a/services/forgejo/service.yaml
+++ b/services/forgejo/service.yaml
@ -0,0 +1,28 @@
 service:
  name: forgejo
  owner_node: saturn
  exposure: private
  dependencies: []
  ports:
    - container: 3000
      host: 3000
      protocol: tcp
    - container: 22
      host: 222
      protocol: tcp
  healthcheck:
    type: http
    endpoint: http://localhost:3000/api/healthz
    interval: 1m
    timeout: 10s
    retries: 5
  restart_policy: unless-stopped
  persistence:
    paths:
      - /opt/homelab/data/forgejo/data
  runtime:
    directories:
      - /opt/homelab/data/forgejo/data
    env_vars:
      - USER_UID
      - USER_GID
--- a/services/mosquitto/README.md
+++ b/services/mosquitto/README.md
@ -0,0 +1,9 @@
 # Mosquitto MQTT Broker
 Eclipse Mosquitto is an open source (EPL/EDL licensed) message broker that implements the MQTT protocol versions 5.0, 3.1.1 and 3.1.
 ## Usage
 Deployed on the `piha` node.
 Port 1883 for standard MQTT.
 Port 9001 for WebSockets.
--- a/services/mosquitto/docker-compose.yml
+++ b/services/mosquitto/docker-compose.yml
@ -0,0 +1,12 @@
 services:
  mosquitto:
    image: eclipse-mosquitto:latest
    container_name: mosquitto
    restart: unless-stopped
    ports:
      - '1883:1883'
      - '9001:9001'
    volumes:
      - /opt/homelab/data/mosquitto/config:/mosquitto/config
      - /opt/homelab/data/mosquitto/data:/mosquitto/data
      - /opt/homelab/data/mosquitto/log:/mosquitto/log
--- a/services/mosquitto/env.example
+++ b/services/mosquitto/env.example
@ -0,0 +1,2 @@
 # No specific environment variables required by default.
 # Mosquitto is mainly configured via /opt/homelab/data/mosquitto/config/mosquitto.conf
--- a/services/mosquitto/healthcheck.sh
+++ b/services/mosquitto/healthcheck.sh
@ -0,0 +1,17 @@
 #!/bin/bash
 # Healthcheck for Mosquitto
 # Check if the container is running
 if ! docker ps --filter "name=mosquitto" --filter "status=running" | grep -q "mosquitto"; then
    echo "[FAIL] Mosquitto container is not running"
    exit 1
 fi
 # Basic port check for 1883
 if ! (echo > /dev/tcp/localhost/1883) >/dev/null 2>&1; then
    echo "[FAIL] Mosquitto port 1883 is not reachable"
    exit 1
 fi
 echo "[OK] Mosquitto is healthy"
 exit 0
--- a/services/mosquitto/service.yaml
+++ b/services/mosquitto/service.yaml
@ -0,0 +1,29 @@
 service:
  name: mosquitto
  owner_node: piha
  exposure: private
  dependencies: []
  ports:
    - container: 1883
      host: 1883
      protocol: tcp
    - container: 9001
      host: 9001
      protocol: tcp
  healthcheck:
    type: container
    interval: 30s
    timeout: 10s
    retries: 3
  restart_policy: unless-stopped
  persistence:
    paths:
      - /opt/homelab/data/mosquitto/config
      - /opt/homelab/data/mosquitto/data
      - /opt/homelab/data/mosquitto/log
  runtime:
    directories:
      - /opt/homelab/data/mosquitto/config
      - /opt/homelab/data/mosquitto/data
      - /opt/homelab/data/mosquitto/log
    env_vars: []
--- a/services/npm/README.md
+++ b/services/npm/README.md
@ -0,0 +1,13 @@
 # Nginx Proxy Manager (NPM)
 Expose your services easily and securely with Nginx Proxy Manager.
 ## Features
 - Secure HTTPS via Let's Encrypt
 - Easy to use Web UI
 - Advanced configuration for power users
 ## Usage
 Deployed on the `vps` node for public ingress.
 Web UI is available on port 81.
--- a/services/npm/env.example
+++ b/services/npm/env.example
@ -0,0 +1,2 @@
 # No environment variables required for standard NPM deployment.
 # Local overrides can be placed in /opt/homelab/config/npm/.env
--- a/services/npm/healthcheck.sh
+++ b/services/npm/healthcheck.sh
@ -0,0 +1,17 @@
 #!/bin/bash
 # Healthcheck for Nginx Proxy Manager
 # Check if the container is running
 if ! docker ps --filter "name=npm" --filter "status=running" | grep -q "npm"; then
    echo "[FAIL] NPM container is not running"
    exit 1
 fi
 # Check Web UI responsiveness (port 81)
 if ! curl -sf http://localhost:81 > /dev/null; then
    echo "[FAIL] NPM Web UI is not responding"
    exit 1
 fi
 echo "[OK] NPM is healthy"
 exit 0
--- a/services/npm/service.yaml
+++ b/services/npm/service.yaml
@ -0,0 +1,31 @@
 service:
  name: npm
  owner_node: vps
  exposure: public
  dependencies: []
  ports:
    - container: 80
      host: 80
      protocol: tcp
    - container: 81
      host: 81
      protocol: tcp
    - container: 443
      host: 443
      protocol: tcp
  healthcheck:
    type: http
    endpoint: http://localhost:81
    interval: 30s
    timeout: 10s
    retries: 3
  restart_policy: unless-stopped
  persistence:
    paths:
      - /opt/homelab/data/npm/data
      - /opt/homelab/data/npm/letsencrypt
  runtime:
    directories:
      - /opt/homelab/data/npm/data
      - /opt/homelab/data/npm/letsencrypt
    env_vars: []
--- a/services/ollama/README.md
+++ b/services/ollama/README.md
@ -0,0 +1,13 @@
 # Ollama
 Get up and running with large language models locally.
 ## Usage
 Deployed on the `solaria` node for GPU acceleration.
 API is available on port 11434.
 Example check:
 ```bash
 curl http://localhost:11434/api/tags
 ```
--- a/services/ollama/docker-compose.yml
+++ b/services/ollama/docker-compose.yml
@ -0,0 +1,16 @@
 services:
  ollama:
    image: ollama/ollama:latest
    container_name: ollama
    restart: unless-stopped
    ports:
      - '11434:11434'
    volumes:
      - /opt/homelab/data/ollama:/root/.ollama
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]
--- a/services/ollama/env.example
+++ b/services/ollama/env.example
@ -0,0 +1,2 @@
 # No specific environment variables required by default.
 # CUDA_VISIBLE_DEVICES=0
--- a/services/ollama/healthcheck.sh
+++ b/services/ollama/healthcheck.sh
@ -0,0 +1,17 @@
 #!/bin/bash
 # Healthcheck for Ollama
 # Check if the container is running
 if ! docker ps --filter "name=ollama" --filter "status=running" | grep -q "ollama"; then
    echo "[FAIL] Ollama container is not running"
    exit 1
 fi
 # Check API responsiveness
 if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
    echo "[FAIL] Ollama API is not responding"
    exit 1
 fi
 echo "[OK] Ollama is healthy"
 exit 0
--- a/services/ollama/service.yaml
+++ b/services/ollama/service.yaml
@ -0,0 +1,23 @@
 service:
  name: ollama
  owner_node: solaria
  exposure: private
  dependencies: []
  ports:
    - container: 11434
      host: 11434
      protocol: tcp
  healthcheck:
    type: http
    endpoint: http://localhost:11434/api/tags
    interval: 1m
    timeout: 10s
    retries: 3
  restart_policy: unless-stopped
  persistence:
    paths:
      - /opt/homelab/data/ollama
  runtime:
    directories:
      - /opt/homelab/data/ollama
    env_vars: []
--- a/services/zigbee2mqtt/README.md
+++ b/services/zigbee2mqtt/README.md
@ -0,0 +1,10 @@
 # Zigbee2MQTT
 Zigbee to MQTT bridge, get rid of your proprietary Zigbee bridges.
 ## Usage
 Deployed on the `piha` node.
 Requires a Zigbee adapter (e.g., Sonoff ZBDongle-E) mapped to `/dev/ttyACM0`.
 Frontend is available on port 8080.
--- a/services/zigbee2mqtt/docker-compose.yml
+++ b/services/zigbee2mqtt/docker-compose.yml
@ -0,0 +1,14 @@
 services:
  zigbee2mqtt:
    container_name: zigbee2mqtt
    image: koenkk/zigbee2mqtt:latest
    restart: unless-stopped
    volumes:
      - /opt/homelab/data/zigbee2mqtt/data:/app/data
      - /run/udev:/run/udev:ro
    ports:
      - 8080:8080
    devices:
      - /dev/ttyACM0:/dev/ttyACM0
    environment:
      - TZ=Europe/Stockholm
--- a/services/zigbee2mqtt/env.example
+++ b/services/zigbee2mqtt/env.example
@ -0,0 +1,3 @@
 TZ=Europe/Stockholm
 # MQTT credentials if applicable
 # Z2M_MQTT_SERVER=mqtt://mosquitto:1883
--- a/services/zigbee2mqtt/healthcheck.sh
+++ b/services/zigbee2mqtt/healthcheck.sh
@ -0,0 +1,17 @@
 #!/bin/bash
 # Healthcheck for Zigbee2MQTT
 # Check if the container is running
 if ! docker ps --filter "name=zigbee2mqtt" --filter "status=running" | grep -q "zigbee2mqtt"; then
    echo "[FAIL] Zigbee2MQTT container is not running"
    exit 1
 fi
 # Check frontend responsiveness
 if ! curl -sf http://localhost:8080 > /dev/null; then
    echo "[FAIL] Zigbee2MQTT frontend is not responding"
    exit 1
 fi
 echo "[OK] Zigbee2MQTT is healthy"
 exit 0
--- a/services/zigbee2mqtt/service.yaml
+++ b/services/zigbee2mqtt/service.yaml
@ -0,0 +1,25 @@
 service:
  name: zigbee2mqtt
  owner_node: piha
  exposure: private
  dependencies:
    - mosquitto
  ports:
    - container: 8080
      host: 8080
      protocol: tcp
  healthcheck:
    type: http
    endpoint: http://localhost:8080
    interval: 30s
    timeout: 10s
    retries: 3
  restart_policy: unless-stopped
  persistence:
    paths:
      - /opt/homelab/data/zigbee2mqtt/data
  runtime:
    directories:
      - /opt/homelab/data/zigbee2mqtt/data
    env_vars:
      - TZ
		`@ -0,0 +1,2 @@`
							`# No specific environment variables required by default.`
							`# Mosquitto is mainly configured via /opt/homelab/data/mosquitto/config/mosquitto.conf`
		`@ -0,0 +1,2 @@`
							`# No environment variables required for standard NPM deployment.`
							`# Local overrides can be placed in /opt/homelab/config/npm/.env`
		`@ -0,0 +1,2 @@`
							`# No specific environment variables required by default.`
							`# CUDA_VISIBLE_DEVICES=0`