Merge pull request 'Add node capability model' (#3) from capability-model into master

Reviewed-on: #3
This commit is contained in:
oskar 2026-05-11 20:56:46 +02:00
commit 0fa4df4ee1
42 changed files with 1054 additions and 19 deletions

85
docs/capabilities.md Normal file
View file

@ -0,0 +1,85 @@
# Node Capability Model
This document defines the capability model for the homelab infrastructure. The goal is to provide a declarative way to describe what each node can do, its constraints, and its suitability for various workloads.
## Overview
Capabilities are defined per host in `hosts/<hostname>/capabilities.yaml`. This metadata allows infrastructure tooling and future AI agents to reason about workload placement, recovery, and compatibility without hardcoding logic into the orchestration system.
## Schema Definition
The `capabilities.yaml` file follows this structure:
```yaml
capabilities:
hardware:
cpu:
arch: <string> # e.g., x86_64, arm64
cores: <int>
threads: <int>
memory:
total_gb: <int>
acceleration:
type: <string> # e.g., none, cuda, tpu, vaapi
model: <string> # e.g., "NVIDIA RTX 3060", "Coral Edge TPU"
virtualization:
supported: <boolean>
type: <string> # e.g., kvm, docker-only
storage:
persistence: <string> # ephemeral, persistent, redundant
type: <string> # ssd, hdd, nvme, sd-card
capacity_gb: <int>
networking:
reachability: <string> # public, tailscale-only, lan-only
ingress_suitability: <boolean>
bandwidth: <string> # e.g., "1Gbps", "100Mbps", "LTE"
runtime:
container_engine: <string> # docker, podman, containerd
os: <string> # debian, ubuntu, alpine, nixos
operational:
power_constraint: <string> # low-power, mains, battery-backed
connectivity: <string> # stable, intermittent
availability_target: <string> # high, medium, best-effort
deployment:
suitability: [<string>] # list of workload types (e.g., ai, database, edge, web)
restricted: <boolean> # if true, only specific workloads are allowed
```
## Placement Reasoning Examples
### AI Workloads
A service requiring `cuda` acceleration will be matched against nodes where `capabilities.hardware.acceleration.type == "cuda"`.
* **Target:** `solaria`
### Public Ingress
A service requiring public exposure will look for `capabilities.networking.ingress_suitability == true`.
* **Target:** `vps`
### Low-Power Staging
Staging workloads that should not consume significant power or are tolerant of intermittent connectivity.
* **Target:** `chelsty`
## Recovery Reasoning Examples
### Failover Strategy
If `saturn` (the primary orchestrator) fails:
1. Identify nodes with `roles: [control]` or `roles: [infra]`.
2. Check `capabilities.operational.availability_target == "high"`.
3. Propose migration of critical infra services to `piha`.
### Storage-Bound Services
If a node with `persistence: persistent` fails, the agent must check if there are other nodes with `persistence: persistent` and compatible `storage.type` before attempting recovery, or warn about potential data loss if moved to an `ephemeral` node.
## Future Usage by AI Agents
Future autonomous agents will use this metadata to:
1. **Evaluate Suitability:** Match service requirements (from `service.yaml`) against node capabilities.
2. **Generate Plans:** Create step-by-step deployment or migration plans based on hardware compatibility.
3. **Validate Topology:** Ensure that a proposed multi-node setup doesn't violate networking or operational constraints (e.g., don't put a DB on an intermittent node).
4. **Propose Failover:** Automatically suggest the best alternative node during an outage.

View file

@ -10,22 +10,44 @@ This document describes the GitOps-lite deployment process for the homelab.
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure. 4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN. 5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
## Deployment Process ## Staged Deployment Framework
### 1. Preparation (on SATURN) The homelab uses a staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable.
- Modify or create service definitions in `services/`. ### Deployment Stages
- Assign services to hosts by creating/updating `hosts/<hostname>/services.txt` (or similar mapping).
- Commit and push changes to the Forgejo instance.
### 2. Deployment (on Execution Node) 1. **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment.
2. **deploy**: Executes `docker compose` commands for all assigned services.
3. **verify**: Checks the health and connectivity of deployed services.
4. **diagnose**: Performs deep checks and resource analysis if something goes wrong.
5. **rollback**: Reverts to a previous known-good state.
6. **resume**: Automatically continues from the last successful stage.
Execution nodes run a deployment script (e.g., via cron or manual trigger) that: ### State Tracking and Logging
1. Performs a `git pull` from the source of truth. - **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`.
2. Identifies services assigned to this host. - **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`.
3. Symlinks or copies `services/<service>/docker-compose.yml` to `/opt/homelab/services/`.
4. Runs `docker compose up -d --remove-orphans`. ### Operational Semantics
Deployment is **hybrid**:
- **SATURN** acts as the orchestrator and source of truth.
- **Nodes** execute the deployment locally using the `deploy.sh` script.
- Human-in-the-loop is required for triggering and confirming deployments.
### Recovery Workflow
If a deployment fails:
1. Run `deploy.sh diagnose` to identify the issue.
2. Use the `recover-node` AI prompt to analyze logs and get recommendations.
3. Either fix the issue and run `deploy.sh resume`, or use `deploy.sh rollback`.
## Onboarding New Nodes
Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
1. Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
2. Bootstrap the node (Docker, Tailscale, Git).
3. Run the staged deployment framework starting with `prepare`.
## Host-Local Overrides ## Host-Local Overrides

51
docs/lifecycle.md Normal file
View file

@ -0,0 +1,51 @@
# Service Lifecycle and Recovery
This document defines the lifecycle of a service in the homelab and the procedures for operational recovery.
## Service Lifecycle
1. **Onboarding**:
- Create `services/<service>/` directory.
- Define `docker-compose.yml`, `service.yaml`, `README.md`, `env.example`, and `healthcheck.sh`.
- Register service in `inventory/topology.yaml` or relevant host configs.
2. **Provisioning**:
- Ensure `/opt/homelab/data/<service>` exists.
- Ensure `/opt/homelab/config/<service>` exists and contains required secrets/configs.
- Setup environment variables from `env.example` into `/opt/homelab/config/<service>/.env`.
3. **Deployment**:
- `docker compose pull`
- `docker compose up -d`
4. **Verification**:
- Run `healthcheck.sh`.
- Verify ports are reachable according to `service.yaml`.
5. **Maintenance**:
- Periodic updates via `docker compose pull`.
- Log monitoring via `docker compose logs -f`.
6. **Decommissioning**:
- `docker compose down`.
- Archive `/opt/homelab/data/<service>` if necessary.
## Operational Recovery
### 1. Container Failure
If a service is unhealthy:
- Check `docker compose logs`.
- Restart: `docker compose restart`.
- Recreate: `docker compose up -d --force-recreate`.
### 2. Node Failure
If a host node fails:
- Services with `owner_node` matching the failed node must be recovered on a backup node or the node must be restored.
- Persistence data must be restored from backups to `/opt/homelab/data/<service>`.
### 3. Dependency Recovery
If a dependency fails:
- Services depending on it might report unhealthy status.
- Recover the dependency first.
- Re-verify dependent services.
## Persistent Data Conventions
- **Data**: `/opt/homelab/data/<service>` - Primary persistent state.
- **Config**: `/opt/homelab/config/<service>` - Local overrides and secrets.
- **Backups**: Standard backup routines should target `/opt/homelab/data/`.

75
docs/service-model.md Normal file
View file

@ -0,0 +1,75 @@
# Service Model and Healthchecks
This document defines the normalized service model for the homelab.
## Service Layout
Each service must reside in its own directory under `services/`:
```text
services/<service>/
├── docker-compose.yml # Docker Compose definition
├── service.yaml # Service metadata and orchestration contract
├── README.md # Service documentation
├── env.example # Template for required environment variables
└── healthcheck.sh # Standardized healthcheck script
```
## Service Metadata (`service.yaml`)
The `service.yaml` file provides a machine-readable contract for deployment and orchestration.
### Schema
```yaml
service:
name: <string> # Canonical service name (kebab-case)
owner_node: <string> # Preferred host node
exposure: <class> # public, private, or local-only
dependencies: [<service>] # List of required services
ports:
- container: <int>
host: <int>
protocol: <tcp|udp>
healthcheck:
type: <string> # local-only, container, http, mqtt
endpoint: <string> # URL or topic if applicable
interval: <duration>
timeout: <duration>
retries: <int>
restart_policy: <string> # unless-stopped, always, etc.
persistence:
paths:
- /opt/homelab/data/<service>/...
runtime:
directories: [<string>] # Required host directories to be created
env_vars: [<string>] # List of required environment variables (keys only)
```
## Healthcheck Semantics
The `healthcheck.sh` script should return `0` for healthy and `1` for unhealthy. It should support different modes based on `service.yaml` definitions.
### 1. Local-only
Checks if the container is running and the process is alive within the host.
### 2. Container-level
Uses `docker inspect` or `docker exec` to check internal container health.
### 3. HTTP
Performs a `curl` against a specific endpoint (e.g., `/health` or `/`).
### 4. MQTT
Verifies that a specific topic is being updated or responds to a ping.
### 5. Dependency-aware
The healthcheck script may optionally check if its dependencies are healthy before reporting its own status.
## Runtime Authority
`/opt/homelab/config/<service>` is the source of truth for:
- Secrets (not in Git)
- Host-local overrides
- Mutable configuration
Services should mount files from this directory as needed.

View file

@ -19,11 +19,14 @@ This document defines the standards and conventions for the homelab GitOps-lite
/ /
├── docs/ # Infrastructure documentation ├── docs/ # Infrastructure documentation
├── hosts/ # Host-specific configurations ├── hosts/ # Host-specific configurations
│ ├── saturn/ ├── inventory/ # Topology and templates
│ ├── solaria/ ├── services/ # Normalized service definitions
│ ├── piha/ │ └── <service>/
│ └── vps/ │ ├── docker-compose.yml
├── services/ # Reusable service definitions (Docker Compose) │ ├── service.yaml
│ ├── README.md
│ ├── env.example
│ └── healthcheck.sh
├── scripts/ # Management and deployment scripts ├── scripts/ # Management and deployment scripts
└── README.md └── README.md
``` ```
@ -37,18 +40,28 @@ Runtime state must live outside the repository to keep it immutable and clean.
├── services/ # Active docker-compose files (deployed from git) ├── services/ # Active docker-compose files (deployed from git)
├── data/ # Persistent volume data (backed up) ├── data/ # Persistent volume data (backed up)
├── config/ # Host-local overrides and secrets (not in git) ├── config/ # Host-local overrides and secrets (not in git)
│ └── <service>/
│ ├── .env # Merged environment variables
│ └── overrides/ # Local configuration overrides
└── logs/ # Service logs └── logs/ # Service logs
``` ```
## Service Standards
1. **Normalization**: Every service MUST follow the `services/<service>/` layout.
2. **Metadata**: Every service MUST have a `service.yaml` defining its operational contract.
3. **Healthchecks**: Every service MUST have a `healthcheck.sh` for verification.
4. **Secrets**: NEVER commit secrets to Git. Use `env.example` as a template and populate `/opt/homelab/config/<service>/.env` on the host.
## Docker Compose Standards ## Docker Compose Standards
1. **File Naming**: Use `docker-compose.yml`. 1. **File Naming**: Use `docker-compose.yml`.
2. **Container Naming**: `service-name`. 2. **Container Naming**: Match the service name.
3. **Restarts**: Always use `restart: unless-stopped`. 3. **Restarts**: Always use `restart: unless-stopped` unless specified otherwise in `service.yaml`.
4. **Networking**: 4. **Networking**:
- Use `tailscale` internal mesh for inter-host communication. - Use `tailscale` internal mesh for inter-host communication.
- Expose ports only when necessary. - Expose ports only when necessary.
5. **Volumes**: Use named volumes or absolute paths to `/opt/homelab/data/service-name`. 5. **Volumes**: Use absolute paths to `/opt/homelab/data/<service>`.
## Environment Variables ## Environment Variables

View file

@ -0,0 +1,40 @@
capabilities:
hardware:
cpu:
arch: x86_64
cores: 4
threads: 4
memory:
total_gb: 16
acceleration:
type: none
virtualization:
supported: true
type: kvm
storage:
persistence: persistent
type: ssd
capacity_gb: 250
networking:
reachability: tailscale-only
ingress_suitability: false
bandwidth: LTE
runtime:
container_engine: docker
os: debian
operational:
power_constraint: low-power
connectivity: intermittent
availability_target: best-effort
deployment:
suitability:
- staging
- homeassistant
- edge
restricted: false

View file

@ -0,0 +1,39 @@
capabilities:
hardware:
cpu:
arch: arm64
cores: 4
threads: 4
memory:
total_gb: 4
acceleration:
type: none
virtualization:
supported: false
type: docker-only
storage:
persistence: persistent
type: sd-card
capacity_gb: 32
networking:
reachability: tailscale-only
ingress_suitability: false
bandwidth: 1Gbps
runtime:
container_engine: docker
os: debian
operational:
power_constraint: mains
connectivity: stable
availability_target: medium
deployment:
suitability:
- infra
- monitoring
restricted: false

View file

@ -0,0 +1,40 @@
capabilities:
hardware:
cpu:
arch: arm64
cores: 8
threads: 8
memory:
total_gb: 8
acceleration:
type: none
virtualization:
supported: false
type: docker-only
storage:
persistence: persistent
type: sd-card
capacity_gb: 64
networking:
reachability: tailscale-only
ingress_suitability: false
bandwidth: 1Gbps
runtime:
container_engine: docker
os: debian
operational:
power_constraint: mains
connectivity: stable
availability_target: high
deployment:
suitability:
- control
- development
- infra
restricted: false

View file

@ -0,0 +1,41 @@
capabilities:
hardware:
cpu:
arch: x86_64
cores: 12
threads: 24
memory:
total_gb: 64
acceleration:
type: cuda
model: "NVIDIA RTX 4070"
virtualization:
supported: true
type: kvm
storage:
persistence: redundant
type: nvme
capacity_gb: 2000
networking:
reachability: tailscale-only
ingress_suitability: false
bandwidth: 1Gbps
runtime:
container_engine: docker
os: ubuntu
operational:
power_constraint: mains
connectivity: stable
availability_target: medium
deployment:
suitability:
- ai
- compute
- database
restricted: false

View file

@ -0,0 +1,40 @@
capabilities:
hardware:
cpu:
arch: x86_64
cores: 2
threads: 2
memory:
total_gb: 4
acceleration:
type: none
virtualization:
supported: false
type: docker-only
storage:
persistence: persistent
type: ssd
capacity_gb: 80
networking:
reachability: public
ingress_suitability: true
bandwidth: 1Gbps
runtime:
container_engine: docker
os: debian
operational:
power_constraint: mains
connectivity: stable
availability_target: high
deployment:
suitability:
- edge
- ingress
- web
restricted: true

View file

@ -0,0 +1,29 @@
---
title: How to Add a New Node to the Homelab
description: This guide outlines the process for onboarding a new execution node into the GitOps-lite environment.
phases:
- phase: 1. Preparation (on SATURN)
steps:
- "Define Node Inventory: Create hosts/<hostname>/ directory"
- "Add host.yaml with hardware metadata"
- "Add networking.yaml with IP and Tailscale info"
- "Add capabilities.yaml with node capability description"
- "Add services.txt listing assigned services"
- "Update inventory/topology.yaml"
- "Commit and push changes to Forgejo"
- phase: 2. Bootstrapping (on the New Node)
steps:
- "Install OS (Debian/Ubuntu recommended)"
- "Configure SSH and user access"
- "Install Docker, Docker Compose, Tailscale, Git"
- "Join the tailnet"
- "Clone repository: git clone <forgejo-url>/homelab-codex.git ~/homelab-codex-ws"
- "Setup runtime: sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
- phase: 3. Initial Deployment
steps:
- "Run prepare: ~/homelab-codex-ws/scripts/deploy/deploy.sh prepare"
- "Run deploy: ~/homelab-codex-ws/scripts/deploy/deploy.sh deploy"
- "Run verify: ~/homelab-codex-ws/scripts/deploy/deploy.sh verify"

View file

@ -0,0 +1,29 @@
---
bootstrap_checklist:
pre_flight:
- task: "Hardware connected and powered"
done: false
- task: "Base OS installed (Debian/Ubuntu)"
done: false
- task: "Network connectivity established"
done: false
- task: "SSH access configured"
done: false
onboarding:
- task: "Tailscale installed and authenticated"
done: false
- task: "Docker and Compose V2 installed"
done: false
- task: "Git installed"
done: false
- task: "Repository cloned to ~/homelab-codex-ws"
done: false
- task: "Opt homelab structure created"
done: false
initial_run:
- task: "deploy.sh prepare successful"
done: false
- task: "deploy.sh deploy successful"
done: false
- task: "deploy.sh verify successful"
done: false

View file

@ -0,0 +1,18 @@
---
discovery_commands:
cpu:
- "lscpu"
- "cat /proc/cpuinfo"
memory:
- "free -h"
storage:
- "lsblk"
- "df -h"
network:
- "ip addr"
- "tailscale status"
gpu:
- "nvidia-smi"
- "lspci | grep -i vga"
usb:
- "lsusb"

View file

@ -0,0 +1,13 @@
---
node_preparation:
actions:
- name: update_system
command: "sudo apt update && sudo apt upgrade -y"
- name: install_dependencies
command: "sudo apt install -y curl git docker.io docker-compose-v2 tailscale"
- name: configure_docker_permissions
command: "sudo usermod -aG docker $USER"
- name: create_runtime_directories
command: "sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
- name: initialize_repo
command: "git clone <repo_url> ~/homelab-codex-ws"

View file

@ -0,0 +1,13 @@
### System Prompt Addendum: Create Node
**Context**: You are assisting in adding a new node to the homelab.
**Task**: Generate the necessary inventory files for a new node.
**Requirements**:
1. Ask for: hostname, IP address, Tailscale IP, hardware specs (CPU/RAM/Storage), and intended role/services.
2. Generate `hosts/<hostname>/host.yaml` and `hosts/<hostname>/networking.yaml`.
3. Provide a snippet for `inventory/topology.yaml`.
4. Recommend services based on hardware (e.g., if GPU is present, suggest inference services).
**Output Format**: YAML blocks for each file.
**Restriction**: Do NOT execute any shell commands. Only provide the configuration.

View file

@ -0,0 +1,16 @@
### System Prompt Addendum: Deploy Node
**Context**: Orchestrating a deployment across one or more nodes.
**Task**: Generate the deployment plan and verification checklist.
**Requirements**:
1. Identify which nodes need updates based on git changes.
2. Recommend the sequence of stages (e.g., `prepare` on all, then `deploy` on edge nodes first).
3. Generate a human-readable checklist for the operator.
4. Define verification criteria for the `verify` stage.
**Output Format**:
- Deployment Plan (sequence of commands).
- Verification Checklist.
**Restriction**: Do NOT mutate infrastructure autonomously.

View file

@ -0,0 +1,17 @@
### System Prompt Addendum: Recover Node
**Context**: A homelab node is unresponsive or has suffered data loss.
**Task**: Analyze logs and state to recommend recovery steps.
**Requirements**:
1. Request the content of `/opt/homelab/logs/deploy/` (latest log) and `/opt/homelab/state/deploy/current_stage`.
2. Analyze the last failed stage.
3. Recommend specific `deploy.sh` commands (e.g., `rollback` or `resume`).
4. Provide manual recovery steps if automated stages fail.
**Output Format**:
- Analysis of the failure.
- Recommended action.
- Documentation of the recovery process.
**Restriction**: Do NOT auto-execute deployment.

110
scripts/deploy/deploy.sh Executable file
View file

@ -0,0 +1,110 @@
#!/usr/bin/env bash
# deploy.sh - Staged deployment framework for homelab nodes.
# Usage: ./deploy.sh [stage]
set -e
# --- Configuration ---
RUNTIME_PATH="/opt/homelab"
STATE_DIR="${RUNTIME_PATH}/state/deploy"
LOG_DIR="${RUNTIME_PATH}/logs/deploy"
REPO_PATH="${HOME}/homelab-codex-ws"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
LOG_FILE="${LOG_DIR}/deploy_${TIMESTAMP}.log"
# --- Initialization ---
mkdir -p "$STATE_DIR" "$LOG_DIR"
# Redirection for logging
exec > >(tee -a "$LOG_FILE") 2>&1
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
}
set_state() {
echo "$1" > "${STATE_DIR}/current_stage"
log "State set to: $1"
}
get_state() {
if [ -f "${STATE_DIR}/current_stage" ]; then
cat "${STATE_DIR}/current_stage"
else
echo "none"
fi
}
# --- Stages ---
stage_prepare() {
log "Stage: PREPARE"
set_state "prepare"
# Skeleton: Pull latest changes, check dependencies, validate inventory
log "Checking repository at $REPO_PATH..."
cd "$REPO_PATH" && git pull
log "Preparation complete."
}
stage_deploy() {
log "Stage: DEPLOY"
set_state "deploy"
# Skeleton: Iterate through services and run docker compose
log "Deploying services defined for $(hostname)..."
# Implementation detail: loop through services/ and run compose
log "Deployment complete."
}
stage_verify() {
log "Stage: VERIFY"
set_state "verify"
# Skeleton: Check container status, healthchecks, connectivity
log "Verifying service health..."
docker ps
log "Verification complete."
}
stage_diagnose() {
log "Stage: DIAGNOSE"
# Skeleton: Check logs, resource usage, networking
log "Running diagnostics..."
docker stats --no-stream
log "Diagnostics complete."
}
stage_rollback() {
log "Stage: ROLLBACK"
# Skeleton: Revert to previous git commit or previous state
log "Rolling back changes..."
log "Rollback complete."
}
stage_resume() {
log "Stage: RESUME"
CURRENT=$(get_state)
log "Resuming from state: $CURRENT"
case "$CURRENT" in
"prepare") stage_deploy ;;
"deploy") stage_verify ;;
"verify") log "Last deployment was verified. Nothing to resume." ;;
*) log "Unknown state or nothing to resume. Starting from prepare..."; stage_prepare ;;
esac
}
# --- Main ---
COMMAND=${1:-resume}
log "--- Homelab Deployment Started (Command: $COMMAND) ---"
case "$COMMAND" in
prepare) stage_prepare ;;
deploy) stage_deploy ;;
verify) stage_verify ;;
diagnose) stage_diagnose ;;
rollback) stage_rollback ;;
resume) stage_resume ;;
*) echo "Usage: $0 {prepare|deploy|verify|diagnose|rollback|resume}"; exit 1 ;;
esac
log "--- Homelab Deployment Finished ---"

View file

@ -0,0 +1,9 @@
# Forgejo
Forgejo is a self-hosted lightweight software forge. Easy to install and low maintenance.
## Usage
Deployed on the `saturn` node as the git source of truth.
Web UI is available on port 3000.
SSH for git is available on port 222.

View file

@ -0,0 +1,15 @@
services:
forgejo:
image: codeberg.org/forgejo/forgejo:latest
container_name: forgejo
restart: unless-stopped
environment:
- USER_UID=1000
- USER_GID=1000
volumes:
- /opt/homelab/data/forgejo/data:/data
- /etc/timezone:/etc/timezone:ro
- /etc/localtime:/etc/localtime:ro
ports:
- '3000:3000'
- '222:22'

View file

@ -0,0 +1,3 @@
USER_UID=1000
USER_GID=1000
# FORGEJO__database__DB_TYPE=sqlite3

View file

@ -0,0 +1,17 @@
#!/bin/bash
# Healthcheck for Forgejo
# Check if the container is running
if ! docker ps --filter "name=forgejo" --filter "status=running" | grep -q "forgejo"; then
echo "[FAIL] Forgejo container is not running"
exit 1
fi
# Check API health endpoint
if ! curl -sf http://localhost:3000/api/healthz > /dev/null; then
echo "[FAIL] Forgejo API is not responding"
exit 1
fi
echo "[OK] Forgejo is healthy"
exit 0

View file

@ -0,0 +1,28 @@
service:
name: forgejo
owner_node: saturn
exposure: private
dependencies: []
ports:
- container: 3000
host: 3000
protocol: tcp
- container: 22
host: 222
protocol: tcp
healthcheck:
type: http
endpoint: http://localhost:3000/api/healthz
interval: 1m
timeout: 10s
retries: 5
restart_policy: unless-stopped
persistence:
paths:
- /opt/homelab/data/forgejo/data
runtime:
directories:
- /opt/homelab/data/forgejo/data
env_vars:
- USER_UID
- USER_GID

View file

@ -0,0 +1,9 @@
# Mosquitto MQTT Broker
Eclipse Mosquitto is an open source (EPL/EDL licensed) message broker that implements the MQTT protocol versions 5.0, 3.1.1 and 3.1.
## Usage
Deployed on the `piha` node.
Port 1883 for standard MQTT.
Port 9001 for WebSockets.

View file

@ -0,0 +1,12 @@
services:
mosquitto:
image: eclipse-mosquitto:latest
container_name: mosquitto
restart: unless-stopped
ports:
- '1883:1883'
- '9001:9001'
volumes:
- /opt/homelab/data/mosquitto/config:/mosquitto/config
- /opt/homelab/data/mosquitto/data:/mosquitto/data
- /opt/homelab/data/mosquitto/log:/mosquitto/log

View file

@ -0,0 +1,2 @@
# No specific environment variables required by default.
# Mosquitto is mainly configured via /opt/homelab/data/mosquitto/config/mosquitto.conf

View file

@ -0,0 +1,17 @@
#!/bin/bash
# Healthcheck for Mosquitto
# Check if the container is running
if ! docker ps --filter "name=mosquitto" --filter "status=running" | grep -q "mosquitto"; then
echo "[FAIL] Mosquitto container is not running"
exit 1
fi
# Basic port check for 1883
if ! (echo > /dev/tcp/localhost/1883) >/dev/null 2>&1; then
echo "[FAIL] Mosquitto port 1883 is not reachable"
exit 1
fi
echo "[OK] Mosquitto is healthy"
exit 0

View file

@ -0,0 +1,29 @@
service:
name: mosquitto
owner_node: piha
exposure: private
dependencies: []
ports:
- container: 1883
host: 1883
protocol: tcp
- container: 9001
host: 9001
protocol: tcp
healthcheck:
type: container
interval: 30s
timeout: 10s
retries: 3
restart_policy: unless-stopped
persistence:
paths:
- /opt/homelab/data/mosquitto/config
- /opt/homelab/data/mosquitto/data
- /opt/homelab/data/mosquitto/log
runtime:
directories:
- /opt/homelab/data/mosquitto/config
- /opt/homelab/data/mosquitto/data
- /opt/homelab/data/mosquitto/log
env_vars: []

13
services/npm/README.md Normal file
View file

@ -0,0 +1,13 @@
# Nginx Proxy Manager (NPM)
Expose your services easily and securely with Nginx Proxy Manager.
## Features
- Secure HTTPS via Let's Encrypt
- Easy to use Web UI
- Advanced configuration for power users
## Usage
Deployed on the `vps` node for public ingress.
Web UI is available on port 81.

2
services/npm/env.example Normal file
View file

@ -0,0 +1,2 @@
# No environment variables required for standard NPM deployment.
# Local overrides can be placed in /opt/homelab/config/npm/.env

View file

@ -0,0 +1,17 @@
#!/bin/bash
# Healthcheck for Nginx Proxy Manager
# Check if the container is running
if ! docker ps --filter "name=npm" --filter "status=running" | grep -q "npm"; then
echo "[FAIL] NPM container is not running"
exit 1
fi
# Check Web UI responsiveness (port 81)
if ! curl -sf http://localhost:81 > /dev/null; then
echo "[FAIL] NPM Web UI is not responding"
exit 1
fi
echo "[OK] NPM is healthy"
exit 0

31
services/npm/service.yaml Normal file
View file

@ -0,0 +1,31 @@
service:
name: npm
owner_node: vps
exposure: public
dependencies: []
ports:
- container: 80
host: 80
protocol: tcp
- container: 81
host: 81
protocol: tcp
- container: 443
host: 443
protocol: tcp
healthcheck:
type: http
endpoint: http://localhost:81
interval: 30s
timeout: 10s
retries: 3
restart_policy: unless-stopped
persistence:
paths:
- /opt/homelab/data/npm/data
- /opt/homelab/data/npm/letsencrypt
runtime:
directories:
- /opt/homelab/data/npm/data
- /opt/homelab/data/npm/letsencrypt
env_vars: []

13
services/ollama/README.md Normal file
View file

@ -0,0 +1,13 @@
# Ollama
Get up and running with large language models locally.
## Usage
Deployed on the `solaria` node for GPU acceleration.
API is available on port 11434.
Example check:
```bash
curl http://localhost:11434/api/tags
```

View file

@ -0,0 +1,16 @@
services:
ollama:
image: ollama/ollama:latest
container_name: ollama
restart: unless-stopped
ports:
- '11434:11434'
volumes:
- /opt/homelab/data/ollama:/root/.ollama
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]

View file

@ -0,0 +1,2 @@
# No specific environment variables required by default.
# CUDA_VISIBLE_DEVICES=0

View file

@ -0,0 +1,17 @@
#!/bin/bash
# Healthcheck for Ollama
# Check if the container is running
if ! docker ps --filter "name=ollama" --filter "status=running" | grep -q "ollama"; then
echo "[FAIL] Ollama container is not running"
exit 1
fi
# Check API responsiveness
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
echo "[FAIL] Ollama API is not responding"
exit 1
fi
echo "[OK] Ollama is healthy"
exit 0

View file

@ -0,0 +1,23 @@
service:
name: ollama
owner_node: solaria
exposure: private
dependencies: []
ports:
- container: 11434
host: 11434
protocol: tcp
healthcheck:
type: http
endpoint: http://localhost:11434/api/tags
interval: 1m
timeout: 10s
retries: 3
restart_policy: unless-stopped
persistence:
paths:
- /opt/homelab/data/ollama
runtime:
directories:
- /opt/homelab/data/ollama
env_vars: []

View file

@ -0,0 +1,10 @@
# Zigbee2MQTT
Zigbee to MQTT bridge, get rid of your proprietary Zigbee bridges.
## Usage
Deployed on the `piha` node.
Requires a Zigbee adapter (e.g., Sonoff ZBDongle-E) mapped to `/dev/ttyACM0`.
Frontend is available on port 8080.

View file

@ -0,0 +1,14 @@
services:
zigbee2mqtt:
container_name: zigbee2mqtt
image: koenkk/zigbee2mqtt:latest
restart: unless-stopped
volumes:
- /opt/homelab/data/zigbee2mqtt/data:/app/data
- /run/udev:/run/udev:ro
ports:
- 8080:8080
devices:
- /dev/ttyACM0:/dev/ttyACM0
environment:
- TZ=Europe/Stockholm

View file

@ -0,0 +1,3 @@
TZ=Europe/Stockholm
# MQTT credentials if applicable
# Z2M_MQTT_SERVER=mqtt://mosquitto:1883

View file

@ -0,0 +1,17 @@
#!/bin/bash
# Healthcheck for Zigbee2MQTT
# Check if the container is running
if ! docker ps --filter "name=zigbee2mqtt" --filter "status=running" | grep -q "zigbee2mqtt"; then
echo "[FAIL] Zigbee2MQTT container is not running"
exit 1
fi
# Check frontend responsiveness
if ! curl -sf http://localhost:8080 > /dev/null; then
echo "[FAIL] Zigbee2MQTT frontend is not responding"
exit 1
fi
echo "[OK] Zigbee2MQTT is healthy"
exit 0

View file

@ -0,0 +1,25 @@
service:
name: zigbee2mqtt
owner_node: piha
exposure: private
dependencies:
- mosquitto
ports:
- container: 8080
host: 8080
protocol: tcp
healthcheck:
type: http
endpoint: http://localhost:8080
interval: 30s
timeout: 10s
retries: 3
restart_policy: unless-stopped
persistence:
paths:
- /opt/homelab/data/zigbee2mqtt/data
runtime:
directories:
- /opt/homelab/data/zigbee2mqtt/data
env_vars:
- TZ