homelab-codex-ws/docs/chelsty-runtime.md
Oskar Kapala 603e10a364 docs: session summary 2026-05-27 + update observer/control-plane/chelsty docs
docs/sessions/2026-05-27.md (new):
- Full session record: problems found, all commits shipped, end state
- Written in Polish per operator preference for session notes
- Known limitations: SLZB-06U offline, ezsp→ember migration pending

docs/observer-runtime.md:
- Document per-node checkpoint format (replaces old global checkpoint)
- Add service_healthy / service_recovered resolution behavior
- Document ghost key pruning (_prune_stale_world patterns)
- Add event type reference table (negative vs positive)

docs/vps-control-plane.md:
- Add container names and network_mode: host detail
- Document monitor:false, NODE_ALIAS_MAP, auto-cancel behavior
- Add piha agent-system materializer integration note
- Rewrite recovery section with actionable bootstrap-flood diagnosis
- Add action state machine (pending→approved→running→completed/cancelled)

docs/chelsty-runtime.md:
- Add chelsty-infra/chelsty-ha node table
- Document docker-compose v1 constraint (always use docker-compose, not docker compose)
- Add mosquitto network_mode:host + z2m extra_hosts:host-gateway explanation
- Add z2m config writable requirement (EROFS failure mode documented)
- Add chelsty-ha monitor:false rationale
- Add minimal configuration.yaml template for z2m

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-27 16:18:31 +02:00

5.2 KiB

CHELSTY Runtime

This document describes the runtime environment and deployment flow for CHELSTY, an offline-capable home automation edge node split across two VMs.

Node Role Services
chelsty-infra LTE edge hypervisor Mosquitto, Zigbee2MQTT, stability-agent, node-agent
chelsty-ha Home Assistant VM homeassistant (no node-agent — see below)

Both nodes share an LTE uplink and must function fully offline (Zigbee, MQTT, HA automations) without any connectivity to SATURN, VPS, or Forgejo.

Runtime Layout

/opt/homelab/
├── config/          # Service-specific configs and secrets (not in Git)
│   ├── mosquitto/
│   └── zigbee2mqtt/
├── data/            # Persistent service data
│   ├── mosquitto/   # Persistence DB, password file
│   └── zigbee2mqtt/
│       └── data/    # z2m config, coordinator backup, network key
└── logs/

SLZB-06U Integration

CHELSTY uses a SMLIGHT SLZB-06U Zigbee coordinator connected over Ethernet/TCP.

  • Coordinator IP: 192.168.1.105
  • Port: 6638
  • Adapter: ezsp (deprecated — migration to ember recommended, requires only changing adapter: ember in configuration.yaml)
  • Zigbee2MQTT config key: serial.port: tcp://192.168.1.105:6638

⚠️ Never use /dev/ttyUSB0 — the coordinator is always TCP-only on this site.

Networking Constraints

Mosquitto — network_mode: host

Mosquitto runs with network_mode: host so that all containers on the same host can reach it at localhost:1883. Do not change this.

Zigbee2MQTT — bridge network + extra_hosts

Zigbee2MQTT runs in a bridge-networked container (needed for port mapping compatibility with docker-compose v1). To reach the host-networked Mosquitto:

# hosts/chelsty-infra/runtime/zigbee2mqtt/docker-compose.override.yml
services:
  zigbee2mqtt:
    extra_hosts:
      - "mosquitto:host-gateway"

This maps the mosquitto hostname inside the z2m container to the Docker host gateway IP, so mqtt://mosquitto:1883 reaches the host-networked Mosquitto process.

Why not network_mode: host for z2m?
chelsty-infra runs docker-compose v1 (1.29.2). In v1, network_mode: host cannot coexist with ports: declared in the base docker-compose.yml — raises InvalidArgument. The extra_hosts approach avoids this.

Zigbee2MQTT Config Location

The configuration.yaml must be writable — z2m migrates and rewrites it on startup. It lives in the data directory:

/opt/homelab/data/zigbee2mqtt/data/configuration.yaml

This path is mounted read-write by the base docker-compose.yml:

volumes:
  - /opt/homelab/data/zigbee2mqtt/data:/app/data

Do not mount configuration.yaml as a separate :ro volume — z2m will fail with EROFS.

Minimal configuration.yaml

homeassistant: true
permit_join: false
mqtt:
  base_topic: zigbee2mqtt
  server: mqtt://mosquitto:1883
serial:
  port: tcp://192.168.1.105:6638
  adapter: ezsp
frontend:
  port: 8080
advanced:
  log_level: info

chelsty-ha — No node-agent

chelsty-ha does not have a node-agent deployed. Home Assistant is monitored indirectly: if MQTT goes silent on chelsty-infra, HA is likely down.

In hosts/chelsty-ha/services.yaml:

services:
  homeassistant:
    monitor: false   # No node-agent; suppresses supervisor action generation

Remove monitor: false once node-agent is bootstrapped on this VM.

Deployment Flow

Initial Bootstrap

./scripts/bootstrap/chelsty-runtime.sh

Deploy services

./scripts/deploy/deploy-node.sh chelsty-infra
./scripts/deploy/deploy-node.sh chelsty-ha

Manual (SSH) — chelsty-infra uses docker-compose v1

ssh oskar@100.122.201.22
cd ~/homelab-codex-ws/services/<service>
docker-compose -f docker-compose.yml \
  -f ../../hosts/chelsty-infra/runtime/<service>/docker-compose.override.yml \
  up -d --build --force-recreate

Note: docker compose (v2) is not available on chelsty-infra — always use docker-compose (hyphenated, v1 1.29.2).

Recovery Procedures

Mosquitto stopped

ssh oskar@100.122.201.22 "docker start mosquitto"
# Ensure restart policy is correct:
docker update --restart unless-stopped mosquitto

Zigbee2MQTT won't start

  1. Check logs: docker logs zigbee2mqtt --tail 50
  2. Verify SLZB-06U reachable from host: nc -zv 192.168.1.105 6638
  3. Verify config is not empty: cat /opt/homelab/data/zigbee2mqtt/data/configuration.yaml
  4. If config missing, recreate from the minimal template above

SLZB-06U unreachable

192.168.1.105:6638 EHOSTUNREACH means the coordinator is offline or the LAN is down. Zigbee2MQTT will keep retrying — no restart needed once the coordinator returns.

Critical Backup Sets

Data Path
HA config + DB /opt/homelab/data/homeassistant/ on chelsty-ha
z2m config + coordinator backup + network key /opt/homelab/data/zigbee2mqtt/data/
Mosquitto persistence + password file /opt/homelab/data/mosquitto/
SLZB-06U coordinator state Backup via SLZB-06U web UI at 192.168.1.105

⚠️ The Zigbee network key is in configuration.yaml or coordinator_backup.json — losing it requires re-pairing all devices.