Add node capability model #3
85
docs/capabilities.md
Normal file
85
docs/capabilities.md
Normal file
|
|
@ -0,0 +1,85 @@
|
|||
# Node Capability Model
|
||||
|
||||
This document defines the capability model for the homelab infrastructure. The goal is to provide a declarative way to describe what each node can do, its constraints, and its suitability for various workloads.
|
||||
|
||||
## Overview
|
||||
|
||||
Capabilities are defined per host in `hosts/<hostname>/capabilities.yaml`. This metadata allows infrastructure tooling and future AI agents to reason about workload placement, recovery, and compatibility without hardcoding logic into the orchestration system.
|
||||
|
||||
## Schema Definition
|
||||
|
||||
The `capabilities.yaml` file follows this structure:
|
||||
|
||||
```yaml
|
||||
capabilities:
|
||||
hardware:
|
||||
cpu:
|
||||
arch: <string> # e.g., x86_64, arm64
|
||||
cores: <int>
|
||||
threads: <int>
|
||||
memory:
|
||||
total_gb: <int>
|
||||
acceleration:
|
||||
type: <string> # e.g., none, cuda, tpu, vaapi
|
||||
model: <string> # e.g., "NVIDIA RTX 3060", "Coral Edge TPU"
|
||||
|
||||
virtualization:
|
||||
supported: <boolean>
|
||||
type: <string> # e.g., kvm, docker-only
|
||||
|
||||
storage:
|
||||
persistence: <string> # ephemeral, persistent, redundant
|
||||
type: <string> # ssd, hdd, nvme, sd-card
|
||||
capacity_gb: <int>
|
||||
|
||||
networking:
|
||||
reachability: <string> # public, tailscale-only, lan-only
|
||||
ingress_suitability: <boolean>
|
||||
bandwidth: <string> # e.g., "1Gbps", "100Mbps", "LTE"
|
||||
|
||||
runtime:
|
||||
container_engine: <string> # docker, podman, containerd
|
||||
os: <string> # debian, ubuntu, alpine, nixos
|
||||
|
||||
operational:
|
||||
power_constraint: <string> # low-power, mains, battery-backed
|
||||
connectivity: <string> # stable, intermittent
|
||||
availability_target: <string> # high, medium, best-effort
|
||||
|
||||
deployment:
|
||||
suitability: [<string>] # list of workload types (e.g., ai, database, edge, web)
|
||||
restricted: <boolean> # if true, only specific workloads are allowed
|
||||
```
|
||||
|
||||
## Placement Reasoning Examples
|
||||
|
||||
### AI Workloads
|
||||
A service requiring `cuda` acceleration will be matched against nodes where `capabilities.hardware.acceleration.type == "cuda"`.
|
||||
* **Target:** `solaria`
|
||||
|
||||
### Public Ingress
|
||||
A service requiring public exposure will look for `capabilities.networking.ingress_suitability == true`.
|
||||
* **Target:** `vps`
|
||||
|
||||
### Low-Power Staging
|
||||
Staging workloads that should not consume significant power or are tolerant of intermittent connectivity.
|
||||
* **Target:** `chelsty`
|
||||
|
||||
## Recovery Reasoning Examples
|
||||
|
||||
### Failover Strategy
|
||||
If `saturn` (the primary orchestrator) fails:
|
||||
1. Identify nodes with `roles: [control]` or `roles: [infra]`.
|
||||
2. Check `capabilities.operational.availability_target == "high"`.
|
||||
3. Propose migration of critical infra services to `piha`.
|
||||
|
||||
### Storage-Bound Services
|
||||
If a node with `persistence: persistent` fails, the agent must check if there are other nodes with `persistence: persistent` and compatible `storage.type` before attempting recovery, or warn about potential data loss if moved to an `ephemeral` node.
|
||||
|
||||
## Future Usage by AI Agents
|
||||
|
||||
Future autonomous agents will use this metadata to:
|
||||
1. **Evaluate Suitability:** Match service requirements (from `service.yaml`) against node capabilities.
|
||||
2. **Generate Plans:** Create step-by-step deployment or migration plans based on hardware compatibility.
|
||||
3. **Validate Topology:** Ensure that a proposed multi-node setup doesn't violate networking or operational constraints (e.g., don't put a DB on an intermittent node).
|
||||
4. **Propose Failover:** Automatically suggest the best alternative node during an outage.
|
||||
|
|
@ -8,23 +8,46 @@ This document describes the GitOps-lite deployment process for the homelab.
|
|||
2. **Unidirectional Flow**: Changes flow from **SATURN** (commit node) to execution nodes.
|
||||
3. **Lightweight**: No complex orchestrators (no Kubernetes). Use `docker compose` and simple shell scripts.
|
||||
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
|
||||
5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
|
||||
|
||||
## Deployment Process
|
||||
## Staged Deployment Framework
|
||||
|
||||
### 1. Preparation (on SATURN)
|
||||
The homelab uses a staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable.
|
||||
|
||||
- Modify or create service definitions in `services/`.
|
||||
- Assign services to hosts by creating/updating `hosts/<hostname>/services.txt` (or similar mapping).
|
||||
- Commit and push changes to the Forgejo instance.
|
||||
### Deployment Stages
|
||||
|
||||
### 2. Deployment (on Execution Node)
|
||||
1. **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment.
|
||||
2. **deploy**: Executes `docker compose` commands for all assigned services.
|
||||
3. **verify**: Checks the health and connectivity of deployed services.
|
||||
4. **diagnose**: Performs deep checks and resource analysis if something goes wrong.
|
||||
5. **rollback**: Reverts to a previous known-good state.
|
||||
6. **resume**: Automatically continues from the last successful stage.
|
||||
|
||||
Execution nodes run a deployment script (e.g., via cron or manual trigger) that:
|
||||
### State Tracking and Logging
|
||||
|
||||
1. Performs a `git pull` from the source of truth.
|
||||
2. Identifies services assigned to this host.
|
||||
3. Symlinks or copies `services/<service>/docker-compose.yml` to `/opt/homelab/services/`.
|
||||
4. Runs `docker compose up -d --remove-orphans`.
|
||||
- **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`.
|
||||
- **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`.
|
||||
|
||||
### Operational Semantics
|
||||
|
||||
Deployment is **hybrid**:
|
||||
- **SATURN** acts as the orchestrator and source of truth.
|
||||
- **Nodes** execute the deployment locally using the `deploy.sh` script.
|
||||
- Human-in-the-loop is required for triggering and confirming deployments.
|
||||
|
||||
### Recovery Workflow
|
||||
|
||||
If a deployment fails:
|
||||
1. Run `deploy.sh diagnose` to identify the issue.
|
||||
2. Use the `recover-node` AI prompt to analyze logs and get recommendations.
|
||||
3. Either fix the issue and run `deploy.sh resume`, or use `deploy.sh rollback`.
|
||||
|
||||
## Onboarding New Nodes
|
||||
|
||||
Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
|
||||
1. Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
|
||||
2. Bootstrap the node (Docker, Tailscale, Git).
|
||||
3. Run the staged deployment framework starting with `prepare`.
|
||||
|
||||
## Host-Local Overrides
|
||||
|
||||
|
|
@ -33,6 +56,57 @@ If a service requires host-specific configuration (e.g., unique device paths for
|
|||
1. Create a `docker-compose.override.yml` in `/opt/homelab/config/<service>/`.
|
||||
2. The deployment script should include this override if it exists.
|
||||
|
||||
For CHELSTY Home Assistant infrastructure, host-local configuration is the
|
||||
authority for runtime identity, secrets, and local device endpoints:
|
||||
|
||||
- Home Assistant config: `/opt/homelab/config/homeassistant`
|
||||
- Zigbee2MQTT config: `/opt/homelab/config/zigbee2mqtt`
|
||||
- Mosquitto config: `/opt/homelab/config/mosquitto`
|
||||
|
||||
CHELSTY services must not require SATURN, VPS, or Forgejo to be reachable after
|
||||
deployment has completed. Docker Compose definitions can still come from Git,
|
||||
but Home Assistant automation, Zigbee control, and MQTT messaging must continue
|
||||
locally while LTE or Tailscale connectivity is unavailable.
|
||||
|
||||
## Exposure Classes
|
||||
|
||||
Service inventory may declare one of these exposure classes:
|
||||
|
||||
- `local-only`: bind only to host, LAN, or container networks. This is the default for Zigbee2MQTT and Mosquitto.
|
||||
- `tailscale-internal`: reachable over Tailscale only. This is appropriate for Home Assistant remote administration.
|
||||
- `public`: reachable from the public internet through a deliberate ingress path, normally the VPS edge role.
|
||||
|
||||
Public exposure is not implied by a service existing in Git. It must be explicit
|
||||
in host inventory and ingress configuration.
|
||||
|
||||
## CHELSTY Home Automation Deployment Notes
|
||||
|
||||
CHELSTY remains a Docker Compose execution node. No Kubernetes, Helm, Ansible,
|
||||
or additional orchestration layer is required for Home Assistant infrastructure.
|
||||
|
||||
The SLZB-06U coordinator is network-connected over Ethernet or WiFi. Compose
|
||||
files and host overrides should configure Zigbee2MQTT for a TCP/network
|
||||
coordinator endpoint, not a USB serial device. Avoid `/dev/ttyUSB0` mappings.
|
||||
|
||||
Runtime paths follow the standard layout:
|
||||
|
||||
- `/opt/homelab/data/homeassistant`
|
||||
- `/opt/homelab/config/homeassistant`
|
||||
- `/opt/homelab/logs/homeassistant`
|
||||
- `/opt/homelab/data/zigbee2mqtt`
|
||||
- `/opt/homelab/config/zigbee2mqtt`
|
||||
- `/opt/homelab/logs/zigbee2mqtt`
|
||||
- `/opt/homelab/data/mosquitto`
|
||||
- `/opt/homelab/config/mosquitto`
|
||||
- `/opt/homelab/logs/mosquitto`
|
||||
|
||||
Recommended backup coverage:
|
||||
|
||||
- Home Assistant config and persistent data before upgrades or major integration changes.
|
||||
- Zigbee2MQTT config, database, coordinator backup files, and Zigbee network key material.
|
||||
- SLZB-06U firmware version, exported configuration, network address reservation, and coordinator state.
|
||||
- Mosquitto config, ACL/password files, persistence data, and bridge configuration if enabled.
|
||||
|
||||
## Secrets Management
|
||||
|
||||
- **Do NOT commit secrets to Git.**
|
||||
|
|
|
|||
51
docs/lifecycle.md
Normal file
51
docs/lifecycle.md
Normal file
|
|
@ -0,0 +1,51 @@
|
|||
# Service Lifecycle and Recovery
|
||||
|
||||
This document defines the lifecycle of a service in the homelab and the procedures for operational recovery.
|
||||
|
||||
## Service Lifecycle
|
||||
|
||||
1. **Onboarding**:
|
||||
- Create `services/<service>/` directory.
|
||||
- Define `docker-compose.yml`, `service.yaml`, `README.md`, `env.example`, and `healthcheck.sh`.
|
||||
- Register service in `inventory/topology.yaml` or relevant host configs.
|
||||
2. **Provisioning**:
|
||||
- Ensure `/opt/homelab/data/<service>` exists.
|
||||
- Ensure `/opt/homelab/config/<service>` exists and contains required secrets/configs.
|
||||
- Setup environment variables from `env.example` into `/opt/homelab/config/<service>/.env`.
|
||||
3. **Deployment**:
|
||||
- `docker compose pull`
|
||||
- `docker compose up -d`
|
||||
4. **Verification**:
|
||||
- Run `healthcheck.sh`.
|
||||
- Verify ports are reachable according to `service.yaml`.
|
||||
5. **Maintenance**:
|
||||
- Periodic updates via `docker compose pull`.
|
||||
- Log monitoring via `docker compose logs -f`.
|
||||
6. **Decommissioning**:
|
||||
- `docker compose down`.
|
||||
- Archive `/opt/homelab/data/<service>` if necessary.
|
||||
|
||||
## Operational Recovery
|
||||
|
||||
### 1. Container Failure
|
||||
If a service is unhealthy:
|
||||
- Check `docker compose logs`.
|
||||
- Restart: `docker compose restart`.
|
||||
- Recreate: `docker compose up -d --force-recreate`.
|
||||
|
||||
### 2. Node Failure
|
||||
If a host node fails:
|
||||
- Services with `owner_node` matching the failed node must be recovered on a backup node or the node must be restored.
|
||||
- Persistence data must be restored from backups to `/opt/homelab/data/<service>`.
|
||||
|
||||
### 3. Dependency Recovery
|
||||
If a dependency fails:
|
||||
- Services depending on it might report unhealthy status.
|
||||
- Recover the dependency first.
|
||||
- Re-verify dependent services.
|
||||
|
||||
## Persistent Data Conventions
|
||||
|
||||
- **Data**: `/opt/homelab/data/<service>` - Primary persistent state.
|
||||
- **Config**: `/opt/homelab/config/<service>` - Local overrides and secrets.
|
||||
- **Backups**: Standard backup routines should target `/opt/homelab/data/`.
|
||||
75
docs/service-model.md
Normal file
75
docs/service-model.md
Normal file
|
|
@ -0,0 +1,75 @@
|
|||
# Service Model and Healthchecks
|
||||
|
||||
This document defines the normalized service model for the homelab.
|
||||
|
||||
## Service Layout
|
||||
|
||||
Each service must reside in its own directory under `services/`:
|
||||
|
||||
```text
|
||||
services/<service>/
|
||||
├── docker-compose.yml # Docker Compose definition
|
||||
├── service.yaml # Service metadata and orchestration contract
|
||||
├── README.md # Service documentation
|
||||
├── env.example # Template for required environment variables
|
||||
└── healthcheck.sh # Standardized healthcheck script
|
||||
```
|
||||
|
||||
## Service Metadata (`service.yaml`)
|
||||
|
||||
The `service.yaml` file provides a machine-readable contract for deployment and orchestration.
|
||||
|
||||
### Schema
|
||||
|
||||
```yaml
|
||||
service:
|
||||
name: <string> # Canonical service name (kebab-case)
|
||||
owner_node: <string> # Preferred host node
|
||||
exposure: <class> # public, private, or local-only
|
||||
dependencies: [<service>] # List of required services
|
||||
ports:
|
||||
- container: <int>
|
||||
host: <int>
|
||||
protocol: <tcp|udp>
|
||||
healthcheck:
|
||||
type: <string> # local-only, container, http, mqtt
|
||||
endpoint: <string> # URL or topic if applicable
|
||||
interval: <duration>
|
||||
timeout: <duration>
|
||||
retries: <int>
|
||||
restart_policy: <string> # unless-stopped, always, etc.
|
||||
persistence:
|
||||
paths:
|
||||
- /opt/homelab/data/<service>/...
|
||||
runtime:
|
||||
directories: [<string>] # Required host directories to be created
|
||||
env_vars: [<string>] # List of required environment variables (keys only)
|
||||
```
|
||||
|
||||
## Healthcheck Semantics
|
||||
|
||||
The `healthcheck.sh` script should return `0` for healthy and `1` for unhealthy. It should support different modes based on `service.yaml` definitions.
|
||||
|
||||
### 1. Local-only
|
||||
Checks if the container is running and the process is alive within the host.
|
||||
|
||||
### 2. Container-level
|
||||
Uses `docker inspect` or `docker exec` to check internal container health.
|
||||
|
||||
### 3. HTTP
|
||||
Performs a `curl` against a specific endpoint (e.g., `/health` or `/`).
|
||||
|
||||
### 4. MQTT
|
||||
Verifies that a specific topic is being updated or responds to a ping.
|
||||
|
||||
### 5. Dependency-aware
|
||||
The healthcheck script may optionally check if its dependencies are healthy before reporting its own status.
|
||||
|
||||
## Runtime Authority
|
||||
|
||||
`/opt/homelab/config/<service>` is the source of truth for:
|
||||
- Secrets (not in Git)
|
||||
- Host-local overrides
|
||||
- Mutable configuration
|
||||
|
||||
Services should mount files from this directory as needed.
|
||||
|
|
@ -19,11 +19,14 @@ This document defines the standards and conventions for the homelab GitOps-lite
|
|||
/
|
||||
├── docs/ # Infrastructure documentation
|
||||
├── hosts/ # Host-specific configurations
|
||||
│ ├── saturn/
|
||||
│ ├── solaria/
|
||||
│ ├── piha/
|
||||
│ └── vps/
|
||||
├── services/ # Reusable service definitions (Docker Compose)
|
||||
├── inventory/ # Topology and templates
|
||||
├── services/ # Normalized service definitions
|
||||
│ └── <service>/
|
||||
│ ├── docker-compose.yml
|
||||
│ ├── service.yaml
|
||||
│ ├── README.md
|
||||
│ ├── env.example
|
||||
│ └── healthcheck.sh
|
||||
├── scripts/ # Management and deployment scripts
|
||||
└── README.md
|
||||
```
|
||||
|
|
@ -37,18 +40,28 @@ Runtime state must live outside the repository to keep it immutable and clean.
|
|||
├── services/ # Active docker-compose files (deployed from git)
|
||||
├── data/ # Persistent volume data (backed up)
|
||||
├── config/ # Host-local overrides and secrets (not in git)
|
||||
│ └── <service>/
|
||||
│ ├── .env # Merged environment variables
|
||||
│ └── overrides/ # Local configuration overrides
|
||||
└── logs/ # Service logs
|
||||
```
|
||||
|
||||
## Service Standards
|
||||
|
||||
1. **Normalization**: Every service MUST follow the `services/<service>/` layout.
|
||||
2. **Metadata**: Every service MUST have a `service.yaml` defining its operational contract.
|
||||
3. **Healthchecks**: Every service MUST have a `healthcheck.sh` for verification.
|
||||
4. **Secrets**: NEVER commit secrets to Git. Use `env.example` as a template and populate `/opt/homelab/config/<service>/.env` on the host.
|
||||
|
||||
## Docker Compose Standards
|
||||
|
||||
1. **File Naming**: Use `docker-compose.yml`.
|
||||
2. **Container Naming**: `service-name`.
|
||||
3. **Restarts**: Always use `restart: unless-stopped`.
|
||||
2. **Container Naming**: Match the service name.
|
||||
3. **Restarts**: Always use `restart: unless-stopped` unless specified otherwise in `service.yaml`.
|
||||
4. **Networking**:
|
||||
- Use `tailscale` internal mesh for inter-host communication.
|
||||
- Expose ports only when necessary.
|
||||
5. **Volumes**: Use named volumes or absolute paths to `/opt/homelab/data/service-name`.
|
||||
5. **Volumes**: Use absolute paths to `/opt/homelab/data/<service>`.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
|
|
|
|||
|
|
@ -8,7 +8,7 @@
|
|||
| PIHA | Infrastructure and monitoring node |
|
||||
| SOLARIA | AI and compute node |
|
||||
| VPS | Public ingress and edge node |
|
||||
| CHELSTY | Virtualization and Home Assistant node |
|
||||
| CHELSTY | LTE-connected edge hypervisor and Home Assistant node |
|
||||
|
||||
## Architecture Principles
|
||||
|
||||
|
|
@ -21,6 +21,36 @@
|
|||
- Deployment uses lightweight shell scripts.
|
||||
- Avoid Kubernetes and heavy orchestration frameworks.
|
||||
|
||||
## CHELSTY Home Automation
|
||||
|
||||
CHELSTY hosts the local home automation control plane. Because it uses an LTE
|
||||
uplink and may be intermittently connected, Home Assistant, Zigbee2MQTT, and
|
||||
Mosquitto must continue operating without SATURN, VPS, or Forgejo.
|
||||
|
||||
The CHELSTY Home Assistant inventory is split across:
|
||||
|
||||
- `hosts/chelsty/services.yaml`
|
||||
- `hosts/chelsty/networking.yaml`
|
||||
- `hosts/chelsty/paths.yaml`
|
||||
|
||||
Service exposure is classified as:
|
||||
|
||||
- `local-only`: available only on local host, LAN, or container networks.
|
||||
- `tailscale-internal`: available to approved Tailscale clients only.
|
||||
- `public`: available from the public internet through explicit ingress.
|
||||
|
||||
Initial CHELSTY service intent:
|
||||
|
||||
| Service | Role | Exposure | Offline required |
|
||||
|---|---|---|---|
|
||||
| homeassistant | Home automation controller | tailscale-internal | yes |
|
||||
| zigbee2mqtt | Zigbee to MQTT bridge | local-only | yes |
|
||||
| mosquitto | Local MQTT broker | local-only | yes |
|
||||
|
||||
The Zigbee coordinator is an SLZB-06U network coordinator. It should be modeled
|
||||
as an Ethernet/WiFi network device consumed by Zigbee2MQTT, not as a USB dongle.
|
||||
Do not use `/dev/ttyUSB0` or other USB device mappings for this coordinator.
|
||||
|
||||
## Runtime Layout
|
||||
|
||||
Runtime data should live under:
|
||||
|
|
@ -32,3 +62,12 @@ with separated:
|
|||
- data
|
||||
- config
|
||||
- logs
|
||||
|
||||
CHELSTY follows the same layout:
|
||||
|
||||
- `/opt/homelab/data/<service>` for persistent service data.
|
||||
- `/opt/homelab/config/<service>` for host-local configuration and secrets.
|
||||
- `/opt/homelab/logs/<service>` for logs that should stay outside Git.
|
||||
|
||||
Critical backup sets on CHELSTY include Home Assistant config, Zigbee2MQTT
|
||||
config and network state, Mosquitto config/data, and SLZB-06U coordinator state.
|
||||
|
|
|
|||
40
hosts/chelsty/capabilities.yaml
Normal file
40
hosts/chelsty/capabilities.yaml
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
capabilities:
|
||||
hardware:
|
||||
cpu:
|
||||
arch: x86_64
|
||||
cores: 4
|
||||
threads: 4
|
||||
memory:
|
||||
total_gb: 16
|
||||
acceleration:
|
||||
type: none
|
||||
|
||||
virtualization:
|
||||
supported: true
|
||||
type: kvm
|
||||
|
||||
storage:
|
||||
persistence: persistent
|
||||
type: ssd
|
||||
capacity_gb: 250
|
||||
|
||||
networking:
|
||||
reachability: tailscale-only
|
||||
ingress_suitability: false
|
||||
bandwidth: LTE
|
||||
|
||||
runtime:
|
||||
container_engine: docker
|
||||
os: debian
|
||||
|
||||
operational:
|
||||
power_constraint: low-power
|
||||
connectivity: intermittent
|
||||
availability_target: best-effort
|
||||
|
||||
deployment:
|
||||
suitability:
|
||||
- staging
|
||||
- homeassistant
|
||||
- edge
|
||||
restricted: false
|
||||
57
hosts/chelsty/networking.yaml
Normal file
57
hosts/chelsty/networking.yaml
Normal file
|
|
@ -0,0 +1,57 @@
|
|||
host: chelsty
|
||||
|
||||
uplink:
|
||||
type: lte
|
||||
connectivity: intermittent
|
||||
public_reachability: not-assumed
|
||||
|
||||
tailscale:
|
||||
enabled: true
|
||||
host_ip: 100.122.201.22
|
||||
role: internal-management
|
||||
|
||||
exposure_classes:
|
||||
local-only:
|
||||
description: LAN, host, or container-network access only.
|
||||
tailscale-internal:
|
||||
description: Tailnet access only; no public ingress dependency.
|
||||
public:
|
||||
description: Public internet exposure through an explicitly defined ingress host.
|
||||
|
||||
networks:
|
||||
home_automation_lan:
|
||||
purpose: Home Assistant, MQTT, Zigbee coordinator, and local device control.
|
||||
offline_required: true
|
||||
internet_required_for_core_operation: false
|
||||
|
||||
devices:
|
||||
slzb-06u:
|
||||
role: zigbee-coordinator
|
||||
vendor_model: SLZB-06U
|
||||
connection_type: network
|
||||
transport:
|
||||
primary: ethernet
|
||||
secondary: wifi
|
||||
usb: false
|
||||
address:
|
||||
hostname: slzb-06u.local
|
||||
ipv4: null
|
||||
port: 6638
|
||||
protocol: tcp
|
||||
consumers:
|
||||
- zigbee2mqtt
|
||||
placement: chelsty-home-automation-lan
|
||||
operational_notes:
|
||||
- Treat the coordinator as a network appliance, not a USB dongle.
|
||||
- Do not configure /dev/ttyUSB0 or other host USB device mappings for this coordinator.
|
||||
- Prefer static DHCP or a reserved IP once the LAN addressing plan is known.
|
||||
backup:
|
||||
recommended: true
|
||||
include:
|
||||
- coordinator firmware version
|
||||
- coordinator configuration export
|
||||
- Zigbee network backup from Zigbee2MQTT
|
||||
- device IEEE address and network parameters
|
||||
notes:
|
||||
- Keep a copy of coordinator state with the Zigbee2MQTT backup set.
|
||||
- Record the reserved IP or DNS name used by Zigbee2MQTT.
|
||||
48
hosts/chelsty/paths.yaml
Normal file
48
hosts/chelsty/paths.yaml
Normal file
|
|
@ -0,0 +1,48 @@
|
|||
host: chelsty
|
||||
|
||||
runtime_root: /opt/homelab
|
||||
|
||||
conventions:
|
||||
services: /opt/homelab/services
|
||||
data: /opt/homelab/data
|
||||
config: /opt/homelab/config
|
||||
logs: /opt/homelab/logs
|
||||
|
||||
services:
|
||||
homeassistant:
|
||||
data: /opt/homelab/data/homeassistant
|
||||
config: /opt/homelab/config/homeassistant
|
||||
logs: /opt/homelab/logs/homeassistant
|
||||
backup_priority: critical
|
||||
|
||||
zigbee2mqtt:
|
||||
data: /opt/homelab/data/zigbee2mqtt
|
||||
config: /opt/homelab/config/zigbee2mqtt
|
||||
logs: /opt/homelab/logs/zigbee2mqtt
|
||||
backup_priority: critical
|
||||
|
||||
mosquitto:
|
||||
data: /opt/homelab/data/mosquitto
|
||||
config: /opt/homelab/config/mosquitto
|
||||
logs: /opt/homelab/logs/mosquitto
|
||||
backup_priority: high
|
||||
|
||||
backup_sets:
|
||||
homeassistant:
|
||||
include:
|
||||
- /opt/homelab/config/homeassistant
|
||||
- /opt/homelab/data/homeassistant
|
||||
restore_note: Restore before starting the Home Assistant container.
|
||||
|
||||
zigbee2mqtt:
|
||||
include:
|
||||
- /opt/homelab/config/zigbee2mqtt
|
||||
- /opt/homelab/data/zigbee2mqtt
|
||||
restore_note: Restore before starting Zigbee2MQTT so coordinator and network state remain aligned.
|
||||
|
||||
slzb-06u:
|
||||
include:
|
||||
- SLZB-06U firmware version
|
||||
- SLZB-06U exported configuration
|
||||
- Zigbee network backup generated by Zigbee2MQTT
|
||||
restore_note: Restore or reconfigure coordinator state before permitting Zigbee2MQTT to reform the network.
|
||||
108
hosts/chelsty/services.yaml
Normal file
108
hosts/chelsty/services.yaml
Normal file
|
|
@ -0,0 +1,108 @@
|
|||
host: chelsty
|
||||
|
||||
exposure_classes:
|
||||
local-only:
|
||||
description: Reachable only from CHELSTY-local networks or container networks.
|
||||
public_ingress: false
|
||||
tailscale_required: false
|
||||
tailscale-internal:
|
||||
description: Reachable through the Tailscale mesh by approved tailnet clients.
|
||||
public_ingress: false
|
||||
tailscale_required: true
|
||||
public:
|
||||
description: Reachable from the public internet through an explicit ingress path.
|
||||
public_ingress: true
|
||||
tailscale_required: false
|
||||
|
||||
operational_constraints:
|
||||
uplink: lte
|
||||
connectivity: intermittent
|
||||
offline_operation_required: true
|
||||
must_not_depend_on:
|
||||
- saturn
|
||||
- vps
|
||||
- forgejo
|
||||
|
||||
services:
|
||||
homeassistant:
|
||||
role: home-automation-controller
|
||||
deployment_model: docker-compose
|
||||
exposure: tailscale-internal
|
||||
offline_required: true
|
||||
depends_on:
|
||||
local:
|
||||
- mosquitto
|
||||
- zigbee2mqtt
|
||||
external: []
|
||||
ports:
|
||||
- name: http
|
||||
container_port: 8123
|
||||
protocol: tcp
|
||||
runtime:
|
||||
config_path: /opt/homelab/config/homeassistant
|
||||
data_path: /opt/homelab/data/homeassistant
|
||||
logs_path: /opt/homelab/logs/homeassistant
|
||||
backup:
|
||||
recommended: true
|
||||
include:
|
||||
- /opt/homelab/config/homeassistant
|
||||
- /opt/homelab/data/homeassistant
|
||||
notes:
|
||||
- Back up before Home Assistant core, supervisor-equivalent, or integration upgrades.
|
||||
- Keep local restore copies on CHELSTY because LTE connectivity may be unavailable during recovery.
|
||||
|
||||
zigbee2mqtt:
|
||||
role: zigbee-mqtt-bridge
|
||||
deployment_model: docker-compose
|
||||
exposure: local-only
|
||||
offline_required: true
|
||||
depends_on:
|
||||
local:
|
||||
- mosquitto
|
||||
external:
|
||||
- slzb-06u
|
||||
coordinator:
|
||||
name: slzb-06u
|
||||
connection: network
|
||||
usb_device: null
|
||||
ports:
|
||||
- name: frontend
|
||||
container_port: 8080
|
||||
protocol: tcp
|
||||
exposure: tailscale-internal
|
||||
runtime:
|
||||
config_path: /opt/homelab/config/zigbee2mqtt
|
||||
data_path: /opt/homelab/data/zigbee2mqtt
|
||||
logs_path: /opt/homelab/logs/zigbee2mqtt
|
||||
backup:
|
||||
recommended: true
|
||||
include:
|
||||
- /opt/homelab/config/zigbee2mqtt
|
||||
- /opt/homelab/data/zigbee2mqtt
|
||||
notes:
|
||||
- Include configuration.yaml, database.db, coordinator backup files, and network key material.
|
||||
- Restore Zigbee2MQTT state together with the SLZB-06U coordinator state when replacing hardware.
|
||||
|
||||
mosquitto:
|
||||
role: local-mqtt-broker
|
||||
deployment_model: docker-compose
|
||||
exposure: local-only
|
||||
offline_required: true
|
||||
depends_on:
|
||||
local: []
|
||||
external: []
|
||||
ports:
|
||||
- name: mqtt
|
||||
container_port: 1883
|
||||
protocol: tcp
|
||||
runtime:
|
||||
config_path: /opt/homelab/config/mosquitto
|
||||
data_path: /opt/homelab/data/mosquitto
|
||||
logs_path: /opt/homelab/logs/mosquitto
|
||||
backup:
|
||||
recommended: true
|
||||
include:
|
||||
- /opt/homelab/config/mosquitto
|
||||
- /opt/homelab/data/mosquitto
|
||||
notes:
|
||||
- Retain ACL, password, persistence, and bridge configuration if enabled.
|
||||
39
hosts/piha/capabilities.yaml
Normal file
39
hosts/piha/capabilities.yaml
Normal file
|
|
@ -0,0 +1,39 @@
|
|||
capabilities:
|
||||
hardware:
|
||||
cpu:
|
||||
arch: arm64
|
||||
cores: 4
|
||||
threads: 4
|
||||
memory:
|
||||
total_gb: 4
|
||||
acceleration:
|
||||
type: none
|
||||
|
||||
virtualization:
|
||||
supported: false
|
||||
type: docker-only
|
||||
|
||||
storage:
|
||||
persistence: persistent
|
||||
type: sd-card
|
||||
capacity_gb: 32
|
||||
|
||||
networking:
|
||||
reachability: tailscale-only
|
||||
ingress_suitability: false
|
||||
bandwidth: 1Gbps
|
||||
|
||||
runtime:
|
||||
container_engine: docker
|
||||
os: debian
|
||||
|
||||
operational:
|
||||
power_constraint: mains
|
||||
connectivity: stable
|
||||
availability_target: medium
|
||||
|
||||
deployment:
|
||||
suitability:
|
||||
- infra
|
||||
- monitoring
|
||||
restricted: false
|
||||
40
hosts/saturn/capabilities.yaml
Normal file
40
hosts/saturn/capabilities.yaml
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
capabilities:
|
||||
hardware:
|
||||
cpu:
|
||||
arch: arm64
|
||||
cores: 8
|
||||
threads: 8
|
||||
memory:
|
||||
total_gb: 8
|
||||
acceleration:
|
||||
type: none
|
||||
|
||||
virtualization:
|
||||
supported: false
|
||||
type: docker-only
|
||||
|
||||
storage:
|
||||
persistence: persistent
|
||||
type: sd-card
|
||||
capacity_gb: 64
|
||||
|
||||
networking:
|
||||
reachability: tailscale-only
|
||||
ingress_suitability: false
|
||||
bandwidth: 1Gbps
|
||||
|
||||
runtime:
|
||||
container_engine: docker
|
||||
os: debian
|
||||
|
||||
operational:
|
||||
power_constraint: mains
|
||||
connectivity: stable
|
||||
availability_target: high
|
||||
|
||||
deployment:
|
||||
suitability:
|
||||
- control
|
||||
- development
|
||||
- infra
|
||||
restricted: false
|
||||
41
hosts/solaria/capabilities.yaml
Normal file
41
hosts/solaria/capabilities.yaml
Normal file
|
|
@ -0,0 +1,41 @@
|
|||
capabilities:
|
||||
hardware:
|
||||
cpu:
|
||||
arch: x86_64
|
||||
cores: 12
|
||||
threads: 24
|
||||
memory:
|
||||
total_gb: 64
|
||||
acceleration:
|
||||
type: cuda
|
||||
model: "NVIDIA RTX 4070"
|
||||
|
||||
virtualization:
|
||||
supported: true
|
||||
type: kvm
|
||||
|
||||
storage:
|
||||
persistence: redundant
|
||||
type: nvme
|
||||
capacity_gb: 2000
|
||||
|
||||
networking:
|
||||
reachability: tailscale-only
|
||||
ingress_suitability: false
|
||||
bandwidth: 1Gbps
|
||||
|
||||
runtime:
|
||||
container_engine: docker
|
||||
os: ubuntu
|
||||
|
||||
operational:
|
||||
power_constraint: mains
|
||||
connectivity: stable
|
||||
availability_target: medium
|
||||
|
||||
deployment:
|
||||
suitability:
|
||||
- ai
|
||||
- compute
|
||||
- database
|
||||
restricted: false
|
||||
40
hosts/vps/capabilities.yaml
Normal file
40
hosts/vps/capabilities.yaml
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
capabilities:
|
||||
hardware:
|
||||
cpu:
|
||||
arch: x86_64
|
||||
cores: 2
|
||||
threads: 2
|
||||
memory:
|
||||
total_gb: 4
|
||||
acceleration:
|
||||
type: none
|
||||
|
||||
virtualization:
|
||||
supported: false
|
||||
type: docker-only
|
||||
|
||||
storage:
|
||||
persistence: persistent
|
||||
type: ssd
|
||||
capacity_gb: 80
|
||||
|
||||
networking:
|
||||
reachability: public
|
||||
ingress_suitability: true
|
||||
bandwidth: 1Gbps
|
||||
|
||||
runtime:
|
||||
container_engine: docker
|
||||
os: debian
|
||||
|
||||
operational:
|
||||
power_constraint: mains
|
||||
connectivity: stable
|
||||
availability_target: high
|
||||
|
||||
deployment:
|
||||
suitability:
|
||||
- edge
|
||||
- ingress
|
||||
- web
|
||||
restricted: true
|
||||
29
inventory/templates/how_to_add_new_node.yaml
Normal file
29
inventory/templates/how_to_add_new_node.yaml
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
---
|
||||
title: How to Add a New Node to the Homelab
|
||||
description: This guide outlines the process for onboarding a new execution node into the GitOps-lite environment.
|
||||
|
||||
phases:
|
||||
- phase: 1. Preparation (on SATURN)
|
||||
steps:
|
||||
- "Define Node Inventory: Create hosts/<hostname>/ directory"
|
||||
- "Add host.yaml with hardware metadata"
|
||||
- "Add networking.yaml with IP and Tailscale info"
|
||||
- "Add capabilities.yaml with node capability description"
|
||||
- "Add services.txt listing assigned services"
|
||||
- "Update inventory/topology.yaml"
|
||||
- "Commit and push changes to Forgejo"
|
||||
|
||||
- phase: 2. Bootstrapping (on the New Node)
|
||||
steps:
|
||||
- "Install OS (Debian/Ubuntu recommended)"
|
||||
- "Configure SSH and user access"
|
||||
- "Install Docker, Docker Compose, Tailscale, Git"
|
||||
- "Join the tailnet"
|
||||
- "Clone repository: git clone <forgejo-url>/homelab-codex.git ~/homelab-codex-ws"
|
||||
- "Setup runtime: sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
|
||||
|
||||
- phase: 3. Initial Deployment
|
||||
steps:
|
||||
- "Run prepare: ~/homelab-codex-ws/scripts/deploy/deploy.sh prepare"
|
||||
- "Run deploy: ~/homelab-codex-ws/scripts/deploy/deploy.sh deploy"
|
||||
- "Run verify: ~/homelab-codex-ws/scripts/deploy/deploy.sh verify"
|
||||
29
inventory/templates/node-bootstrap-checklist.yaml
Normal file
29
inventory/templates/node-bootstrap-checklist.yaml
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
---
|
||||
bootstrap_checklist:
|
||||
pre_flight:
|
||||
- task: "Hardware connected and powered"
|
||||
done: false
|
||||
- task: "Base OS installed (Debian/Ubuntu)"
|
||||
done: false
|
||||
- task: "Network connectivity established"
|
||||
done: false
|
||||
- task: "SSH access configured"
|
||||
done: false
|
||||
onboarding:
|
||||
- task: "Tailscale installed and authenticated"
|
||||
done: false
|
||||
- task: "Docker and Compose V2 installed"
|
||||
done: false
|
||||
- task: "Git installed"
|
||||
done: false
|
||||
- task: "Repository cloned to ~/homelab-codex-ws"
|
||||
done: false
|
||||
- task: "Opt homelab structure created"
|
||||
done: false
|
||||
initial_run:
|
||||
- task: "deploy.sh prepare successful"
|
||||
done: false
|
||||
- task: "deploy.sh deploy successful"
|
||||
done: false
|
||||
- task: "deploy.sh verify successful"
|
||||
done: false
|
||||
18
inventory/templates/node-discovery-commands.yaml
Normal file
18
inventory/templates/node-discovery-commands.yaml
Normal file
|
|
@ -0,0 +1,18 @@
|
|||
---
|
||||
discovery_commands:
|
||||
cpu:
|
||||
- "lscpu"
|
||||
- "cat /proc/cpuinfo"
|
||||
memory:
|
||||
- "free -h"
|
||||
storage:
|
||||
- "lsblk"
|
||||
- "df -h"
|
||||
network:
|
||||
- "ip addr"
|
||||
- "tailscale status"
|
||||
gpu:
|
||||
- "nvidia-smi"
|
||||
- "lspci | grep -i vga"
|
||||
usb:
|
||||
- "lsusb"
|
||||
13
inventory/templates/prepare-node.yaml
Normal file
13
inventory/templates/prepare-node.yaml
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
---
|
||||
node_preparation:
|
||||
actions:
|
||||
- name: update_system
|
||||
command: "sudo apt update && sudo apt upgrade -y"
|
||||
- name: install_dependencies
|
||||
command: "sudo apt install -y curl git docker.io docker-compose-v2 tailscale"
|
||||
- name: configure_docker_permissions
|
||||
command: "sudo usermod -aG docker $USER"
|
||||
- name: create_runtime_directories
|
||||
command: "sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
|
||||
- name: initialize_repo
|
||||
command: "git clone <repo_url> ~/homelab-codex-ws"
|
||||
13
inventory/templates/prompts/create-node
Normal file
13
inventory/templates/prompts/create-node
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
### System Prompt Addendum: Create Node
|
||||
|
||||
**Context**: You are assisting in adding a new node to the homelab.
|
||||
**Task**: Generate the necessary inventory files for a new node.
|
||||
|
||||
**Requirements**:
|
||||
1. Ask for: hostname, IP address, Tailscale IP, hardware specs (CPU/RAM/Storage), and intended role/services.
|
||||
2. Generate `hosts/<hostname>/host.yaml` and `hosts/<hostname>/networking.yaml`.
|
||||
3. Provide a snippet for `inventory/topology.yaml`.
|
||||
4. Recommend services based on hardware (e.g., if GPU is present, suggest inference services).
|
||||
|
||||
**Output Format**: YAML blocks for each file.
|
||||
**Restriction**: Do NOT execute any shell commands. Only provide the configuration.
|
||||
16
inventory/templates/prompts/deploy-node
Normal file
16
inventory/templates/prompts/deploy-node
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
### System Prompt Addendum: Deploy Node
|
||||
|
||||
**Context**: Orchestrating a deployment across one or more nodes.
|
||||
**Task**: Generate the deployment plan and verification checklist.
|
||||
|
||||
**Requirements**:
|
||||
1. Identify which nodes need updates based on git changes.
|
||||
2. Recommend the sequence of stages (e.g., `prepare` on all, then `deploy` on edge nodes first).
|
||||
3. Generate a human-readable checklist for the operator.
|
||||
4. Define verification criteria for the `verify` stage.
|
||||
|
||||
**Output Format**:
|
||||
- Deployment Plan (sequence of commands).
|
||||
- Verification Checklist.
|
||||
|
||||
**Restriction**: Do NOT mutate infrastructure autonomously.
|
||||
17
inventory/templates/prompts/recover-node
Normal file
17
inventory/templates/prompts/recover-node
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
### System Prompt Addendum: Recover Node
|
||||
|
||||
**Context**: A homelab node is unresponsive or has suffered data loss.
|
||||
**Task**: Analyze logs and state to recommend recovery steps.
|
||||
|
||||
**Requirements**:
|
||||
1. Request the content of `/opt/homelab/logs/deploy/` (latest log) and `/opt/homelab/state/deploy/current_stage`.
|
||||
2. Analyze the last failed stage.
|
||||
3. Recommend specific `deploy.sh` commands (e.g., `rollback` or `resume`).
|
||||
4. Provide manual recovery steps if automated stages fail.
|
||||
|
||||
**Output Format**:
|
||||
- Analysis of the failure.
|
||||
- Recommended action.
|
||||
- Documentation of the recovery process.
|
||||
|
||||
**Restriction**: Do NOT auto-execute deployment.
|
||||
|
|
@ -30,6 +30,20 @@ nodes:
|
|||
|
||||
chelsty:
|
||||
roles:
|
||||
- remote
|
||||
- hypervisor
|
||||
- homeassistant
|
||||
- staging
|
||||
connectivity:
|
||||
uplink: lte
|
||||
intermittent: true
|
||||
home_automation:
|
||||
offline_operation_required: true
|
||||
services:
|
||||
- homeassistant
|
||||
- zigbee2mqtt
|
||||
- mosquitto
|
||||
coordinator:
|
||||
model: SLZB-06U
|
||||
connection: network
|
||||
usb: false
|
||||
|
|
|
|||
110
scripts/deploy/deploy.sh
Executable file
110
scripts/deploy/deploy.sh
Executable file
|
|
@ -0,0 +1,110 @@
|
|||
#!/usr/bin/env bash
|
||||
# deploy.sh - Staged deployment framework for homelab nodes.
|
||||
# Usage: ./deploy.sh [stage]
|
||||
|
||||
set -e
|
||||
|
||||
# --- Configuration ---
|
||||
RUNTIME_PATH="/opt/homelab"
|
||||
STATE_DIR="${RUNTIME_PATH}/state/deploy"
|
||||
LOG_DIR="${RUNTIME_PATH}/logs/deploy"
|
||||
REPO_PATH="${HOME}/homelab-codex-ws"
|
||||
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
|
||||
LOG_FILE="${LOG_DIR}/deploy_${TIMESTAMP}.log"
|
||||
|
||||
# --- Initialization ---
|
||||
mkdir -p "$STATE_DIR" "$LOG_DIR"
|
||||
|
||||
# Redirection for logging
|
||||
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||
|
||||
log() {
|
||||
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
|
||||
}
|
||||
|
||||
set_state() {
|
||||
echo "$1" > "${STATE_DIR}/current_stage"
|
||||
log "State set to: $1"
|
||||
}
|
||||
|
||||
get_state() {
|
||||
if [ -f "${STATE_DIR}/current_stage" ]; then
|
||||
cat "${STATE_DIR}/current_stage"
|
||||
else
|
||||
echo "none"
|
||||
fi
|
||||
}
|
||||
|
||||
# --- Stages ---
|
||||
|
||||
stage_prepare() {
|
||||
log "Stage: PREPARE"
|
||||
set_state "prepare"
|
||||
# Skeleton: Pull latest changes, check dependencies, validate inventory
|
||||
log "Checking repository at $REPO_PATH..."
|
||||
cd "$REPO_PATH" && git pull
|
||||
log "Preparation complete."
|
||||
}
|
||||
|
||||
stage_deploy() {
|
||||
log "Stage: DEPLOY"
|
||||
set_state "deploy"
|
||||
# Skeleton: Iterate through services and run docker compose
|
||||
log "Deploying services defined for $(hostname)..."
|
||||
# Implementation detail: loop through services/ and run compose
|
||||
log "Deployment complete."
|
||||
}
|
||||
|
||||
stage_verify() {
|
||||
log "Stage: VERIFY"
|
||||
set_state "verify"
|
||||
# Skeleton: Check container status, healthchecks, connectivity
|
||||
log "Verifying service health..."
|
||||
docker ps
|
||||
log "Verification complete."
|
||||
}
|
||||
|
||||
stage_diagnose() {
|
||||
log "Stage: DIAGNOSE"
|
||||
# Skeleton: Check logs, resource usage, networking
|
||||
log "Running diagnostics..."
|
||||
docker stats --no-stream
|
||||
log "Diagnostics complete."
|
||||
}
|
||||
|
||||
stage_rollback() {
|
||||
log "Stage: ROLLBACK"
|
||||
# Skeleton: Revert to previous git commit or previous state
|
||||
log "Rolling back changes..."
|
||||
log "Rollback complete."
|
||||
}
|
||||
|
||||
stage_resume() {
|
||||
log "Stage: RESUME"
|
||||
CURRENT=$(get_state)
|
||||
log "Resuming from state: $CURRENT"
|
||||
case "$CURRENT" in
|
||||
"prepare") stage_deploy ;;
|
||||
"deploy") stage_verify ;;
|
||||
"verify") log "Last deployment was verified. Nothing to resume." ;;
|
||||
*) log "Unknown state or nothing to resume. Starting from prepare..."; stage_prepare ;;
|
||||
esac
|
||||
}
|
||||
|
||||
# --- Main ---
|
||||
|
||||
COMMAND=${1:-resume}
|
||||
|
||||
log "--- Homelab Deployment Started (Command: $COMMAND) ---"
|
||||
|
||||
case "$COMMAND" in
|
||||
prepare) stage_prepare ;;
|
||||
deploy) stage_deploy ;;
|
||||
verify) stage_verify ;;
|
||||
diagnose) stage_diagnose ;;
|
||||
rollback) stage_rollback ;;
|
||||
resume) stage_resume ;;
|
||||
*) echo "Usage: $0 {prepare|deploy|verify|diagnose|rollback|resume}"; exit 1 ;;
|
||||
esac
|
||||
|
||||
log "--- Homelab Deployment Finished ---"
|
||||
9
services/forgejo/README.md
Normal file
9
services/forgejo/README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Forgejo
|
||||
|
||||
Forgejo is a self-hosted lightweight software forge. Easy to install and low maintenance.
|
||||
|
||||
## Usage
|
||||
Deployed on the `saturn` node as the git source of truth.
|
||||
|
||||
Web UI is available on port 3000.
|
||||
SSH for git is available on port 222.
|
||||
15
services/forgejo/docker-compose.yml
Normal file
15
services/forgejo/docker-compose.yml
Normal file
|
|
@ -0,0 +1,15 @@
|
|||
services:
|
||||
forgejo:
|
||||
image: codeberg.org/forgejo/forgejo:latest
|
||||
container_name: forgejo
|
||||
restart: unless-stopped
|
||||
environment:
|
||||
- USER_UID=1000
|
||||
- USER_GID=1000
|
||||
volumes:
|
||||
- /opt/homelab/data/forgejo/data:/data
|
||||
- /etc/timezone:/etc/timezone:ro
|
||||
- /etc/localtime:/etc/localtime:ro
|
||||
ports:
|
||||
- '3000:3000'
|
||||
- '222:22'
|
||||
3
services/forgejo/env.example
Normal file
3
services/forgejo/env.example
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
USER_UID=1000
|
||||
USER_GID=1000
|
||||
# FORGEJO__database__DB_TYPE=sqlite3
|
||||
17
services/forgejo/healthcheck.sh
Normal file
17
services/forgejo/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
#!/bin/bash
|
||||
# Healthcheck for Forgejo
|
||||
|
||||
# Check if the container is running
|
||||
if ! docker ps --filter "name=forgejo" --filter "status=running" | grep -q "forgejo"; then
|
||||
echo "[FAIL] Forgejo container is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check API health endpoint
|
||||
if ! curl -sf http://localhost:3000/api/healthz > /dev/null; then
|
||||
echo "[FAIL] Forgejo API is not responding"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[OK] Forgejo is healthy"
|
||||
exit 0
|
||||
28
services/forgejo/service.yaml
Normal file
28
services/forgejo/service.yaml
Normal file
|
|
@ -0,0 +1,28 @@
|
|||
service:
|
||||
name: forgejo
|
||||
owner_node: saturn
|
||||
exposure: private
|
||||
dependencies: []
|
||||
ports:
|
||||
- container: 3000
|
||||
host: 3000
|
||||
protocol: tcp
|
||||
- container: 22
|
||||
host: 222
|
||||
protocol: tcp
|
||||
healthcheck:
|
||||
type: http
|
||||
endpoint: http://localhost:3000/api/healthz
|
||||
interval: 1m
|
||||
timeout: 10s
|
||||
retries: 5
|
||||
restart_policy: unless-stopped
|
||||
persistence:
|
||||
paths:
|
||||
- /opt/homelab/data/forgejo/data
|
||||
runtime:
|
||||
directories:
|
||||
- /opt/homelab/data/forgejo/data
|
||||
env_vars:
|
||||
- USER_UID
|
||||
- USER_GID
|
||||
9
services/mosquitto/README.md
Normal file
9
services/mosquitto/README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
|||
# Mosquitto MQTT Broker
|
||||
|
||||
Eclipse Mosquitto is an open source (EPL/EDL licensed) message broker that implements the MQTT protocol versions 5.0, 3.1.1 and 3.1.
|
||||
|
||||
## Usage
|
||||
Deployed on the `piha` node.
|
||||
|
||||
Port 1883 for standard MQTT.
|
||||
Port 9001 for WebSockets.
|
||||
12
services/mosquitto/docker-compose.yml
Normal file
12
services/mosquitto/docker-compose.yml
Normal file
|
|
@ -0,0 +1,12 @@
|
|||
services:
|
||||
mosquitto:
|
||||
image: eclipse-mosquitto:latest
|
||||
container_name: mosquitto
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- '1883:1883'
|
||||
- '9001:9001'
|
||||
volumes:
|
||||
- /opt/homelab/data/mosquitto/config:/mosquitto/config
|
||||
- /opt/homelab/data/mosquitto/data:/mosquitto/data
|
||||
- /opt/homelab/data/mosquitto/log:/mosquitto/log
|
||||
2
services/mosquitto/env.example
Normal file
2
services/mosquitto/env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
# No specific environment variables required by default.
|
||||
# Mosquitto is mainly configured via /opt/homelab/data/mosquitto/config/mosquitto.conf
|
||||
17
services/mosquitto/healthcheck.sh
Normal file
17
services/mosquitto/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
#!/bin/bash
|
||||
# Healthcheck for Mosquitto
|
||||
|
||||
# Check if the container is running
|
||||
if ! docker ps --filter "name=mosquitto" --filter "status=running" | grep -q "mosquitto"; then
|
||||
echo "[FAIL] Mosquitto container is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Basic port check for 1883
|
||||
if ! (echo > /dev/tcp/localhost/1883) >/dev/null 2>&1; then
|
||||
echo "[FAIL] Mosquitto port 1883 is not reachable"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[OK] Mosquitto is healthy"
|
||||
exit 0
|
||||
29
services/mosquitto/service.yaml
Normal file
29
services/mosquitto/service.yaml
Normal file
|
|
@ -0,0 +1,29 @@
|
|||
service:
|
||||
name: mosquitto
|
||||
owner_node: piha
|
||||
exposure: private
|
||||
dependencies: []
|
||||
ports:
|
||||
- container: 1883
|
||||
host: 1883
|
||||
protocol: tcp
|
||||
- container: 9001
|
||||
host: 9001
|
||||
protocol: tcp
|
||||
healthcheck:
|
||||
type: container
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart_policy: unless-stopped
|
||||
persistence:
|
||||
paths:
|
||||
- /opt/homelab/data/mosquitto/config
|
||||
- /opt/homelab/data/mosquitto/data
|
||||
- /opt/homelab/data/mosquitto/log
|
||||
runtime:
|
||||
directories:
|
||||
- /opt/homelab/data/mosquitto/config
|
||||
- /opt/homelab/data/mosquitto/data
|
||||
- /opt/homelab/data/mosquitto/log
|
||||
env_vars: []
|
||||
13
services/npm/README.md
Normal file
13
services/npm/README.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# Nginx Proxy Manager (NPM)
|
||||
|
||||
Expose your services easily and securely with Nginx Proxy Manager.
|
||||
|
||||
## Features
|
||||
- Secure HTTPS via Let's Encrypt
|
||||
- Easy to use Web UI
|
||||
- Advanced configuration for power users
|
||||
|
||||
## Usage
|
||||
Deployed on the `vps` node for public ingress.
|
||||
|
||||
Web UI is available on port 81.
|
||||
2
services/npm/env.example
Normal file
2
services/npm/env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
# No environment variables required for standard NPM deployment.
|
||||
# Local overrides can be placed in /opt/homelab/config/npm/.env
|
||||
17
services/npm/healthcheck.sh
Normal file
17
services/npm/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
#!/bin/bash
|
||||
# Healthcheck for Nginx Proxy Manager
|
||||
|
||||
# Check if the container is running
|
||||
if ! docker ps --filter "name=npm" --filter "status=running" | grep -q "npm"; then
|
||||
echo "[FAIL] NPM container is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check Web UI responsiveness (port 81)
|
||||
if ! curl -sf http://localhost:81 > /dev/null; then
|
||||
echo "[FAIL] NPM Web UI is not responding"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[OK] NPM is healthy"
|
||||
exit 0
|
||||
31
services/npm/service.yaml
Normal file
31
services/npm/service.yaml
Normal file
|
|
@ -0,0 +1,31 @@
|
|||
service:
|
||||
name: npm
|
||||
owner_node: vps
|
||||
exposure: public
|
||||
dependencies: []
|
||||
ports:
|
||||
- container: 80
|
||||
host: 80
|
||||
protocol: tcp
|
||||
- container: 81
|
||||
host: 81
|
||||
protocol: tcp
|
||||
- container: 443
|
||||
host: 443
|
||||
protocol: tcp
|
||||
healthcheck:
|
||||
type: http
|
||||
endpoint: http://localhost:81
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart_policy: unless-stopped
|
||||
persistence:
|
||||
paths:
|
||||
- /opt/homelab/data/npm/data
|
||||
- /opt/homelab/data/npm/letsencrypt
|
||||
runtime:
|
||||
directories:
|
||||
- /opt/homelab/data/npm/data
|
||||
- /opt/homelab/data/npm/letsencrypt
|
||||
env_vars: []
|
||||
13
services/ollama/README.md
Normal file
13
services/ollama/README.md
Normal file
|
|
@ -0,0 +1,13 @@
|
|||
# Ollama
|
||||
|
||||
Get up and running with large language models locally.
|
||||
|
||||
## Usage
|
||||
Deployed on the `solaria` node for GPU acceleration.
|
||||
|
||||
API is available on port 11434.
|
||||
|
||||
Example check:
|
||||
```bash
|
||||
curl http://localhost:11434/api/tags
|
||||
```
|
||||
16
services/ollama/docker-compose.yml
Normal file
16
services/ollama/docker-compose.yml
Normal file
|
|
@ -0,0 +1,16 @@
|
|||
services:
|
||||
ollama:
|
||||
image: ollama/ollama:latest
|
||||
container_name: ollama
|
||||
restart: unless-stopped
|
||||
ports:
|
||||
- '11434:11434'
|
||||
volumes:
|
||||
- /opt/homelab/data/ollama:/root/.ollama
|
||||
deploy:
|
||||
resources:
|
||||
reservations:
|
||||
devices:
|
||||
- driver: nvidia
|
||||
count: all
|
||||
capabilities: [gpu]
|
||||
2
services/ollama/env.example
Normal file
2
services/ollama/env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
|||
# No specific environment variables required by default.
|
||||
# CUDA_VISIBLE_DEVICES=0
|
||||
17
services/ollama/healthcheck.sh
Normal file
17
services/ollama/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
#!/bin/bash
|
||||
# Healthcheck for Ollama
|
||||
|
||||
# Check if the container is running
|
||||
if ! docker ps --filter "name=ollama" --filter "status=running" | grep -q "ollama"; then
|
||||
echo "[FAIL] Ollama container is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check API responsiveness
|
||||
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
|
||||
echo "[FAIL] Ollama API is not responding"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[OK] Ollama is healthy"
|
||||
exit 0
|
||||
23
services/ollama/service.yaml
Normal file
23
services/ollama/service.yaml
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
service:
|
||||
name: ollama
|
||||
owner_node: solaria
|
||||
exposure: private
|
||||
dependencies: []
|
||||
ports:
|
||||
- container: 11434
|
||||
host: 11434
|
||||
protocol: tcp
|
||||
healthcheck:
|
||||
type: http
|
||||
endpoint: http://localhost:11434/api/tags
|
||||
interval: 1m
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart_policy: unless-stopped
|
||||
persistence:
|
||||
paths:
|
||||
- /opt/homelab/data/ollama
|
||||
runtime:
|
||||
directories:
|
||||
- /opt/homelab/data/ollama
|
||||
env_vars: []
|
||||
10
services/zigbee2mqtt/README.md
Normal file
10
services/zigbee2mqtt/README.md
Normal file
|
|
@ -0,0 +1,10 @@
|
|||
# Zigbee2MQTT
|
||||
|
||||
Zigbee to MQTT bridge, get rid of your proprietary Zigbee bridges.
|
||||
|
||||
## Usage
|
||||
Deployed on the `piha` node.
|
||||
|
||||
Requires a Zigbee adapter (e.g., Sonoff ZBDongle-E) mapped to `/dev/ttyACM0`.
|
||||
|
||||
Frontend is available on port 8080.
|
||||
14
services/zigbee2mqtt/docker-compose.yml
Normal file
14
services/zigbee2mqtt/docker-compose.yml
Normal file
|
|
@ -0,0 +1,14 @@
|
|||
services:
|
||||
zigbee2mqtt:
|
||||
container_name: zigbee2mqtt
|
||||
image: koenkk/zigbee2mqtt:latest
|
||||
restart: unless-stopped
|
||||
volumes:
|
||||
- /opt/homelab/data/zigbee2mqtt/data:/app/data
|
||||
- /run/udev:/run/udev:ro
|
||||
ports:
|
||||
- 8080:8080
|
||||
devices:
|
||||
- /dev/ttyACM0:/dev/ttyACM0
|
||||
environment:
|
||||
- TZ=Europe/Stockholm
|
||||
3
services/zigbee2mqtt/env.example
Normal file
3
services/zigbee2mqtt/env.example
Normal file
|
|
@ -0,0 +1,3 @@
|
|||
TZ=Europe/Stockholm
|
||||
# MQTT credentials if applicable
|
||||
# Z2M_MQTT_SERVER=mqtt://mosquitto:1883
|
||||
17
services/zigbee2mqtt/healthcheck.sh
Normal file
17
services/zigbee2mqtt/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
|||
#!/bin/bash
|
||||
# Healthcheck for Zigbee2MQTT
|
||||
|
||||
# Check if the container is running
|
||||
if ! docker ps --filter "name=zigbee2mqtt" --filter "status=running" | grep -q "zigbee2mqtt"; then
|
||||
echo "[FAIL] Zigbee2MQTT container is not running"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Check frontend responsiveness
|
||||
if ! curl -sf http://localhost:8080 > /dev/null; then
|
||||
echo "[FAIL] Zigbee2MQTT frontend is not responding"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "[OK] Zigbee2MQTT is healthy"
|
||||
exit 0
|
||||
25
services/zigbee2mqtt/service.yaml
Normal file
25
services/zigbee2mqtt/service.yaml
Normal file
|
|
@ -0,0 +1,25 @@
|
|||
service:
|
||||
name: zigbee2mqtt
|
||||
owner_node: piha
|
||||
exposure: private
|
||||
dependencies:
|
||||
- mosquitto
|
||||
ports:
|
||||
- container: 8080
|
||||
host: 8080
|
||||
protocol: tcp
|
||||
healthcheck:
|
||||
type: http
|
||||
endpoint: http://localhost:8080
|
||||
interval: 30s
|
||||
timeout: 10s
|
||||
retries: 3
|
||||
restart_policy: unless-stopped
|
||||
persistence:
|
||||
paths:
|
||||
- /opt/homelab/data/zigbee2mqtt/data
|
||||
runtime:
|
||||
directories:
|
||||
- /opt/homelab/data/zigbee2mqtt/data
|
||||
env_vars:
|
||||
- TZ
|
||||
Loading…
Reference in a new issue