Compare commits
3 commits
e106cd81b9
...
0fa4df4ee1
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0fa4df4ee1 | ||
|
|
31e84a139c | ||
|
|
bbdbdb8321 |
85
docs/capabilities.md
Normal file
85
docs/capabilities.md
Normal file
|
|
@ -0,0 +1,85 @@
|
||||||
|
# Node Capability Model
|
||||||
|
|
||||||
|
This document defines the capability model for the homelab infrastructure. The goal is to provide a declarative way to describe what each node can do, its constraints, and its suitability for various workloads.
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Capabilities are defined per host in `hosts/<hostname>/capabilities.yaml`. This metadata allows infrastructure tooling and future AI agents to reason about workload placement, recovery, and compatibility without hardcoding logic into the orchestration system.
|
||||||
|
|
||||||
|
## Schema Definition
|
||||||
|
|
||||||
|
The `capabilities.yaml` file follows this structure:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
capabilities:
|
||||||
|
hardware:
|
||||||
|
cpu:
|
||||||
|
arch: <string> # e.g., x86_64, arm64
|
||||||
|
cores: <int>
|
||||||
|
threads: <int>
|
||||||
|
memory:
|
||||||
|
total_gb: <int>
|
||||||
|
acceleration:
|
||||||
|
type: <string> # e.g., none, cuda, tpu, vaapi
|
||||||
|
model: <string> # e.g., "NVIDIA RTX 3060", "Coral Edge TPU"
|
||||||
|
|
||||||
|
virtualization:
|
||||||
|
supported: <boolean>
|
||||||
|
type: <string> # e.g., kvm, docker-only
|
||||||
|
|
||||||
|
storage:
|
||||||
|
persistence: <string> # ephemeral, persistent, redundant
|
||||||
|
type: <string> # ssd, hdd, nvme, sd-card
|
||||||
|
capacity_gb: <int>
|
||||||
|
|
||||||
|
networking:
|
||||||
|
reachability: <string> # public, tailscale-only, lan-only
|
||||||
|
ingress_suitability: <boolean>
|
||||||
|
bandwidth: <string> # e.g., "1Gbps", "100Mbps", "LTE"
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
container_engine: <string> # docker, podman, containerd
|
||||||
|
os: <string> # debian, ubuntu, alpine, nixos
|
||||||
|
|
||||||
|
operational:
|
||||||
|
power_constraint: <string> # low-power, mains, battery-backed
|
||||||
|
connectivity: <string> # stable, intermittent
|
||||||
|
availability_target: <string> # high, medium, best-effort
|
||||||
|
|
||||||
|
deployment:
|
||||||
|
suitability: [<string>] # list of workload types (e.g., ai, database, edge, web)
|
||||||
|
restricted: <boolean> # if true, only specific workloads are allowed
|
||||||
|
```
|
||||||
|
|
||||||
|
## Placement Reasoning Examples
|
||||||
|
|
||||||
|
### AI Workloads
|
||||||
|
A service requiring `cuda` acceleration will be matched against nodes where `capabilities.hardware.acceleration.type == "cuda"`.
|
||||||
|
* **Target:** `solaria`
|
||||||
|
|
||||||
|
### Public Ingress
|
||||||
|
A service requiring public exposure will look for `capabilities.networking.ingress_suitability == true`.
|
||||||
|
* **Target:** `vps`
|
||||||
|
|
||||||
|
### Low-Power Staging
|
||||||
|
Staging workloads that should not consume significant power or are tolerant of intermittent connectivity.
|
||||||
|
* **Target:** `chelsty`
|
||||||
|
|
||||||
|
## Recovery Reasoning Examples
|
||||||
|
|
||||||
|
### Failover Strategy
|
||||||
|
If `saturn` (the primary orchestrator) fails:
|
||||||
|
1. Identify nodes with `roles: [control]` or `roles: [infra]`.
|
||||||
|
2. Check `capabilities.operational.availability_target == "high"`.
|
||||||
|
3. Propose migration of critical infra services to `piha`.
|
||||||
|
|
||||||
|
### Storage-Bound Services
|
||||||
|
If a node with `persistence: persistent` fails, the agent must check if there are other nodes with `persistence: persistent` and compatible `storage.type` before attempting recovery, or warn about potential data loss if moved to an `ephemeral` node.
|
||||||
|
|
||||||
|
## Future Usage by AI Agents
|
||||||
|
|
||||||
|
Future autonomous agents will use this metadata to:
|
||||||
|
1. **Evaluate Suitability:** Match service requirements (from `service.yaml`) against node capabilities.
|
||||||
|
2. **Generate Plans:** Create step-by-step deployment or migration plans based on hardware compatibility.
|
||||||
|
3. **Validate Topology:** Ensure that a proposed multi-node setup doesn't violate networking or operational constraints (e.g., don't put a DB on an intermittent node).
|
||||||
|
4. **Propose Failover:** Automatically suggest the best alternative node during an outage.
|
||||||
|
|
@ -10,22 +10,44 @@ This document describes the GitOps-lite deployment process for the homelab.
|
||||||
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
|
4. **Tailscale Mesh**: All hosts are connected via Tailscale, allowing secure communication without public port exposure.
|
||||||
5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
|
5. **Host Autonomy**: Services that must operate during WAN or Git outages keep their runtime dependencies on the execution node or local LAN.
|
||||||
|
|
||||||
## Deployment Process
|
## Staged Deployment Framework
|
||||||
|
|
||||||
### 1. Preparation (on SATURN)
|
The homelab uses a staged deployment framework located at `scripts/deploy/deploy.sh`. This script is designed to be resumable, stage-aware, and observable.
|
||||||
|
|
||||||
- Modify or create service definitions in `services/`.
|
### Deployment Stages
|
||||||
- Assign services to hosts by creating/updating `hosts/<hostname>/services.txt` (or similar mapping).
|
|
||||||
- Commit and push changes to the Forgejo instance.
|
|
||||||
|
|
||||||
### 2. Deployment (on Execution Node)
|
1. **prepare**: Pulls the latest changes from Git, validates inventory, and prepares the local environment.
|
||||||
|
2. **deploy**: Executes `docker compose` commands for all assigned services.
|
||||||
|
3. **verify**: Checks the health and connectivity of deployed services.
|
||||||
|
4. **diagnose**: Performs deep checks and resource analysis if something goes wrong.
|
||||||
|
5. **rollback**: Reverts to a previous known-good state.
|
||||||
|
6. **resume**: Automatically continues from the last successful stage.
|
||||||
|
|
||||||
Execution nodes run a deployment script (e.g., via cron or manual trigger) that:
|
### State Tracking and Logging
|
||||||
|
|
||||||
1. Performs a `git pull` from the source of truth.
|
- **State**: Local node state is tracked in `/opt/homelab/state/deploy/current_stage`.
|
||||||
2. Identifies services assigned to this host.
|
- **Logs**: Detailed execution logs are stored in `/opt/homelab/logs/deploy/deploy_<timestamp>.log`.
|
||||||
3. Symlinks or copies `services/<service>/docker-compose.yml` to `/opt/homelab/services/`.
|
|
||||||
4. Runs `docker compose up -d --remove-orphans`.
|
### Operational Semantics
|
||||||
|
|
||||||
|
Deployment is **hybrid**:
|
||||||
|
- **SATURN** acts as the orchestrator and source of truth.
|
||||||
|
- **Nodes** execute the deployment locally using the `deploy.sh` script.
|
||||||
|
- Human-in-the-loop is required for triggering and confirming deployments.
|
||||||
|
|
||||||
|
### Recovery Workflow
|
||||||
|
|
||||||
|
If a deployment fails:
|
||||||
|
1. Run `deploy.sh diagnose` to identify the issue.
|
||||||
|
2. Use the `recover-node` AI prompt to analyze logs and get recommendations.
|
||||||
|
3. Either fix the issue and run `deploy.sh resume`, or use `deploy.sh rollback`.
|
||||||
|
|
||||||
|
## Onboarding New Nodes
|
||||||
|
|
||||||
|
Refer to `inventory/templates/how_to_add_new_node.yaml` for a detailed guide on adding new hardware to the mesh. The general flow is:
|
||||||
|
1. Define node in `hosts/` and `inventory/topology.yaml` on SATURN.
|
||||||
|
2. Bootstrap the node (Docker, Tailscale, Git).
|
||||||
|
3. Run the staged deployment framework starting with `prepare`.
|
||||||
|
|
||||||
## Host-Local Overrides
|
## Host-Local Overrides
|
||||||
|
|
||||||
|
|
|
||||||
51
docs/lifecycle.md
Normal file
51
docs/lifecycle.md
Normal file
|
|
@ -0,0 +1,51 @@
|
||||||
|
# Service Lifecycle and Recovery
|
||||||
|
|
||||||
|
This document defines the lifecycle of a service in the homelab and the procedures for operational recovery.
|
||||||
|
|
||||||
|
## Service Lifecycle
|
||||||
|
|
||||||
|
1. **Onboarding**:
|
||||||
|
- Create `services/<service>/` directory.
|
||||||
|
- Define `docker-compose.yml`, `service.yaml`, `README.md`, `env.example`, and `healthcheck.sh`.
|
||||||
|
- Register service in `inventory/topology.yaml` or relevant host configs.
|
||||||
|
2. **Provisioning**:
|
||||||
|
- Ensure `/opt/homelab/data/<service>` exists.
|
||||||
|
- Ensure `/opt/homelab/config/<service>` exists and contains required secrets/configs.
|
||||||
|
- Setup environment variables from `env.example` into `/opt/homelab/config/<service>/.env`.
|
||||||
|
3. **Deployment**:
|
||||||
|
- `docker compose pull`
|
||||||
|
- `docker compose up -d`
|
||||||
|
4. **Verification**:
|
||||||
|
- Run `healthcheck.sh`.
|
||||||
|
- Verify ports are reachable according to `service.yaml`.
|
||||||
|
5. **Maintenance**:
|
||||||
|
- Periodic updates via `docker compose pull`.
|
||||||
|
- Log monitoring via `docker compose logs -f`.
|
||||||
|
6. **Decommissioning**:
|
||||||
|
- `docker compose down`.
|
||||||
|
- Archive `/opt/homelab/data/<service>` if necessary.
|
||||||
|
|
||||||
|
## Operational Recovery
|
||||||
|
|
||||||
|
### 1. Container Failure
|
||||||
|
If a service is unhealthy:
|
||||||
|
- Check `docker compose logs`.
|
||||||
|
- Restart: `docker compose restart`.
|
||||||
|
- Recreate: `docker compose up -d --force-recreate`.
|
||||||
|
|
||||||
|
### 2. Node Failure
|
||||||
|
If a host node fails:
|
||||||
|
- Services with `owner_node` matching the failed node must be recovered on a backup node or the node must be restored.
|
||||||
|
- Persistence data must be restored from backups to `/opt/homelab/data/<service>`.
|
||||||
|
|
||||||
|
### 3. Dependency Recovery
|
||||||
|
If a dependency fails:
|
||||||
|
- Services depending on it might report unhealthy status.
|
||||||
|
- Recover the dependency first.
|
||||||
|
- Re-verify dependent services.
|
||||||
|
|
||||||
|
## Persistent Data Conventions
|
||||||
|
|
||||||
|
- **Data**: `/opt/homelab/data/<service>` - Primary persistent state.
|
||||||
|
- **Config**: `/opt/homelab/config/<service>` - Local overrides and secrets.
|
||||||
|
- **Backups**: Standard backup routines should target `/opt/homelab/data/`.
|
||||||
75
docs/service-model.md
Normal file
75
docs/service-model.md
Normal file
|
|
@ -0,0 +1,75 @@
|
||||||
|
# Service Model and Healthchecks
|
||||||
|
|
||||||
|
This document defines the normalized service model for the homelab.
|
||||||
|
|
||||||
|
## Service Layout
|
||||||
|
|
||||||
|
Each service must reside in its own directory under `services/`:
|
||||||
|
|
||||||
|
```text
|
||||||
|
services/<service>/
|
||||||
|
├── docker-compose.yml # Docker Compose definition
|
||||||
|
├── service.yaml # Service metadata and orchestration contract
|
||||||
|
├── README.md # Service documentation
|
||||||
|
├── env.example # Template for required environment variables
|
||||||
|
└── healthcheck.sh # Standardized healthcheck script
|
||||||
|
```
|
||||||
|
|
||||||
|
## Service Metadata (`service.yaml`)
|
||||||
|
|
||||||
|
The `service.yaml` file provides a machine-readable contract for deployment and orchestration.
|
||||||
|
|
||||||
|
### Schema
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
service:
|
||||||
|
name: <string> # Canonical service name (kebab-case)
|
||||||
|
owner_node: <string> # Preferred host node
|
||||||
|
exposure: <class> # public, private, or local-only
|
||||||
|
dependencies: [<service>] # List of required services
|
||||||
|
ports:
|
||||||
|
- container: <int>
|
||||||
|
host: <int>
|
||||||
|
protocol: <tcp|udp>
|
||||||
|
healthcheck:
|
||||||
|
type: <string> # local-only, container, http, mqtt
|
||||||
|
endpoint: <string> # URL or topic if applicable
|
||||||
|
interval: <duration>
|
||||||
|
timeout: <duration>
|
||||||
|
retries: <int>
|
||||||
|
restart_policy: <string> # unless-stopped, always, etc.
|
||||||
|
persistence:
|
||||||
|
paths:
|
||||||
|
- /opt/homelab/data/<service>/...
|
||||||
|
runtime:
|
||||||
|
directories: [<string>] # Required host directories to be created
|
||||||
|
env_vars: [<string>] # List of required environment variables (keys only)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Healthcheck Semantics
|
||||||
|
|
||||||
|
The `healthcheck.sh` script should return `0` for healthy and `1` for unhealthy. It should support different modes based on `service.yaml` definitions.
|
||||||
|
|
||||||
|
### 1. Local-only
|
||||||
|
Checks if the container is running and the process is alive within the host.
|
||||||
|
|
||||||
|
### 2. Container-level
|
||||||
|
Uses `docker inspect` or `docker exec` to check internal container health.
|
||||||
|
|
||||||
|
### 3. HTTP
|
||||||
|
Performs a `curl` against a specific endpoint (e.g., `/health` or `/`).
|
||||||
|
|
||||||
|
### 4. MQTT
|
||||||
|
Verifies that a specific topic is being updated or responds to a ping.
|
||||||
|
|
||||||
|
### 5. Dependency-aware
|
||||||
|
The healthcheck script may optionally check if its dependencies are healthy before reporting its own status.
|
||||||
|
|
||||||
|
## Runtime Authority
|
||||||
|
|
||||||
|
`/opt/homelab/config/<service>` is the source of truth for:
|
||||||
|
- Secrets (not in Git)
|
||||||
|
- Host-local overrides
|
||||||
|
- Mutable configuration
|
||||||
|
|
||||||
|
Services should mount files from this directory as needed.
|
||||||
|
|
@ -19,11 +19,14 @@ This document defines the standards and conventions for the homelab GitOps-lite
|
||||||
/
|
/
|
||||||
├── docs/ # Infrastructure documentation
|
├── docs/ # Infrastructure documentation
|
||||||
├── hosts/ # Host-specific configurations
|
├── hosts/ # Host-specific configurations
|
||||||
│ ├── saturn/
|
├── inventory/ # Topology and templates
|
||||||
│ ├── solaria/
|
├── services/ # Normalized service definitions
|
||||||
│ ├── piha/
|
│ └── <service>/
|
||||||
│ └── vps/
|
│ ├── docker-compose.yml
|
||||||
├── services/ # Reusable service definitions (Docker Compose)
|
│ ├── service.yaml
|
||||||
|
│ ├── README.md
|
||||||
|
│ ├── env.example
|
||||||
|
│ └── healthcheck.sh
|
||||||
├── scripts/ # Management and deployment scripts
|
├── scripts/ # Management and deployment scripts
|
||||||
└── README.md
|
└── README.md
|
||||||
```
|
```
|
||||||
|
|
@ -37,18 +40,28 @@ Runtime state must live outside the repository to keep it immutable and clean.
|
||||||
├── services/ # Active docker-compose files (deployed from git)
|
├── services/ # Active docker-compose files (deployed from git)
|
||||||
├── data/ # Persistent volume data (backed up)
|
├── data/ # Persistent volume data (backed up)
|
||||||
├── config/ # Host-local overrides and secrets (not in git)
|
├── config/ # Host-local overrides and secrets (not in git)
|
||||||
|
│ └── <service>/
|
||||||
|
│ ├── .env # Merged environment variables
|
||||||
|
│ └── overrides/ # Local configuration overrides
|
||||||
└── logs/ # Service logs
|
└── logs/ # Service logs
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Service Standards
|
||||||
|
|
||||||
|
1. **Normalization**: Every service MUST follow the `services/<service>/` layout.
|
||||||
|
2. **Metadata**: Every service MUST have a `service.yaml` defining its operational contract.
|
||||||
|
3. **Healthchecks**: Every service MUST have a `healthcheck.sh` for verification.
|
||||||
|
4. **Secrets**: NEVER commit secrets to Git. Use `env.example` as a template and populate `/opt/homelab/config/<service>/.env` on the host.
|
||||||
|
|
||||||
## Docker Compose Standards
|
## Docker Compose Standards
|
||||||
|
|
||||||
1. **File Naming**: Use `docker-compose.yml`.
|
1. **File Naming**: Use `docker-compose.yml`.
|
||||||
2. **Container Naming**: `service-name`.
|
2. **Container Naming**: Match the service name.
|
||||||
3. **Restarts**: Always use `restart: unless-stopped`.
|
3. **Restarts**: Always use `restart: unless-stopped` unless specified otherwise in `service.yaml`.
|
||||||
4. **Networking**:
|
4. **Networking**:
|
||||||
- Use `tailscale` internal mesh for inter-host communication.
|
- Use `tailscale` internal mesh for inter-host communication.
|
||||||
- Expose ports only when necessary.
|
- Expose ports only when necessary.
|
||||||
5. **Volumes**: Use named volumes or absolute paths to `/opt/homelab/data/service-name`.
|
5. **Volumes**: Use absolute paths to `/opt/homelab/data/<service>`.
|
||||||
|
|
||||||
## Environment Variables
|
## Environment Variables
|
||||||
|
|
||||||
|
|
|
||||||
40
hosts/chelsty/capabilities.yaml
Normal file
40
hosts/chelsty/capabilities.yaml
Normal file
|
|
@ -0,0 +1,40 @@
|
||||||
|
capabilities:
|
||||||
|
hardware:
|
||||||
|
cpu:
|
||||||
|
arch: x86_64
|
||||||
|
cores: 4
|
||||||
|
threads: 4
|
||||||
|
memory:
|
||||||
|
total_gb: 16
|
||||||
|
acceleration:
|
||||||
|
type: none
|
||||||
|
|
||||||
|
virtualization:
|
||||||
|
supported: true
|
||||||
|
type: kvm
|
||||||
|
|
||||||
|
storage:
|
||||||
|
persistence: persistent
|
||||||
|
type: ssd
|
||||||
|
capacity_gb: 250
|
||||||
|
|
||||||
|
networking:
|
||||||
|
reachability: tailscale-only
|
||||||
|
ingress_suitability: false
|
||||||
|
bandwidth: LTE
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
container_engine: docker
|
||||||
|
os: debian
|
||||||
|
|
||||||
|
operational:
|
||||||
|
power_constraint: low-power
|
||||||
|
connectivity: intermittent
|
||||||
|
availability_target: best-effort
|
||||||
|
|
||||||
|
deployment:
|
||||||
|
suitability:
|
||||||
|
- staging
|
||||||
|
- homeassistant
|
||||||
|
- edge
|
||||||
|
restricted: false
|
||||||
39
hosts/piha/capabilities.yaml
Normal file
39
hosts/piha/capabilities.yaml
Normal file
|
|
@ -0,0 +1,39 @@
|
||||||
|
capabilities:
|
||||||
|
hardware:
|
||||||
|
cpu:
|
||||||
|
arch: arm64
|
||||||
|
cores: 4
|
||||||
|
threads: 4
|
||||||
|
memory:
|
||||||
|
total_gb: 4
|
||||||
|
acceleration:
|
||||||
|
type: none
|
||||||
|
|
||||||
|
virtualization:
|
||||||
|
supported: false
|
||||||
|
type: docker-only
|
||||||
|
|
||||||
|
storage:
|
||||||
|
persistence: persistent
|
||||||
|
type: sd-card
|
||||||
|
capacity_gb: 32
|
||||||
|
|
||||||
|
networking:
|
||||||
|
reachability: tailscale-only
|
||||||
|
ingress_suitability: false
|
||||||
|
bandwidth: 1Gbps
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
container_engine: docker
|
||||||
|
os: debian
|
||||||
|
|
||||||
|
operational:
|
||||||
|
power_constraint: mains
|
||||||
|
connectivity: stable
|
||||||
|
availability_target: medium
|
||||||
|
|
||||||
|
deployment:
|
||||||
|
suitability:
|
||||||
|
- infra
|
||||||
|
- monitoring
|
||||||
|
restricted: false
|
||||||
40
hosts/saturn/capabilities.yaml
Normal file
40
hosts/saturn/capabilities.yaml
Normal file
|
|
@ -0,0 +1,40 @@
|
||||||
|
capabilities:
|
||||||
|
hardware:
|
||||||
|
cpu:
|
||||||
|
arch: arm64
|
||||||
|
cores: 8
|
||||||
|
threads: 8
|
||||||
|
memory:
|
||||||
|
total_gb: 8
|
||||||
|
acceleration:
|
||||||
|
type: none
|
||||||
|
|
||||||
|
virtualization:
|
||||||
|
supported: false
|
||||||
|
type: docker-only
|
||||||
|
|
||||||
|
storage:
|
||||||
|
persistence: persistent
|
||||||
|
type: sd-card
|
||||||
|
capacity_gb: 64
|
||||||
|
|
||||||
|
networking:
|
||||||
|
reachability: tailscale-only
|
||||||
|
ingress_suitability: false
|
||||||
|
bandwidth: 1Gbps
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
container_engine: docker
|
||||||
|
os: debian
|
||||||
|
|
||||||
|
operational:
|
||||||
|
power_constraint: mains
|
||||||
|
connectivity: stable
|
||||||
|
availability_target: high
|
||||||
|
|
||||||
|
deployment:
|
||||||
|
suitability:
|
||||||
|
- control
|
||||||
|
- development
|
||||||
|
- infra
|
||||||
|
restricted: false
|
||||||
41
hosts/solaria/capabilities.yaml
Normal file
41
hosts/solaria/capabilities.yaml
Normal file
|
|
@ -0,0 +1,41 @@
|
||||||
|
capabilities:
|
||||||
|
hardware:
|
||||||
|
cpu:
|
||||||
|
arch: x86_64
|
||||||
|
cores: 12
|
||||||
|
threads: 24
|
||||||
|
memory:
|
||||||
|
total_gb: 64
|
||||||
|
acceleration:
|
||||||
|
type: cuda
|
||||||
|
model: "NVIDIA RTX 4070"
|
||||||
|
|
||||||
|
virtualization:
|
||||||
|
supported: true
|
||||||
|
type: kvm
|
||||||
|
|
||||||
|
storage:
|
||||||
|
persistence: redundant
|
||||||
|
type: nvme
|
||||||
|
capacity_gb: 2000
|
||||||
|
|
||||||
|
networking:
|
||||||
|
reachability: tailscale-only
|
||||||
|
ingress_suitability: false
|
||||||
|
bandwidth: 1Gbps
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
container_engine: docker
|
||||||
|
os: ubuntu
|
||||||
|
|
||||||
|
operational:
|
||||||
|
power_constraint: mains
|
||||||
|
connectivity: stable
|
||||||
|
availability_target: medium
|
||||||
|
|
||||||
|
deployment:
|
||||||
|
suitability:
|
||||||
|
- ai
|
||||||
|
- compute
|
||||||
|
- database
|
||||||
|
restricted: false
|
||||||
40
hosts/vps/capabilities.yaml
Normal file
40
hosts/vps/capabilities.yaml
Normal file
|
|
@ -0,0 +1,40 @@
|
||||||
|
capabilities:
|
||||||
|
hardware:
|
||||||
|
cpu:
|
||||||
|
arch: x86_64
|
||||||
|
cores: 2
|
||||||
|
threads: 2
|
||||||
|
memory:
|
||||||
|
total_gb: 4
|
||||||
|
acceleration:
|
||||||
|
type: none
|
||||||
|
|
||||||
|
virtualization:
|
||||||
|
supported: false
|
||||||
|
type: docker-only
|
||||||
|
|
||||||
|
storage:
|
||||||
|
persistence: persistent
|
||||||
|
type: ssd
|
||||||
|
capacity_gb: 80
|
||||||
|
|
||||||
|
networking:
|
||||||
|
reachability: public
|
||||||
|
ingress_suitability: true
|
||||||
|
bandwidth: 1Gbps
|
||||||
|
|
||||||
|
runtime:
|
||||||
|
container_engine: docker
|
||||||
|
os: debian
|
||||||
|
|
||||||
|
operational:
|
||||||
|
power_constraint: mains
|
||||||
|
connectivity: stable
|
||||||
|
availability_target: high
|
||||||
|
|
||||||
|
deployment:
|
||||||
|
suitability:
|
||||||
|
- edge
|
||||||
|
- ingress
|
||||||
|
- web
|
||||||
|
restricted: true
|
||||||
29
inventory/templates/how_to_add_new_node.yaml
Normal file
29
inventory/templates/how_to_add_new_node.yaml
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
title: How to Add a New Node to the Homelab
|
||||||
|
description: This guide outlines the process for onboarding a new execution node into the GitOps-lite environment.
|
||||||
|
|
||||||
|
phases:
|
||||||
|
- phase: 1. Preparation (on SATURN)
|
||||||
|
steps:
|
||||||
|
- "Define Node Inventory: Create hosts/<hostname>/ directory"
|
||||||
|
- "Add host.yaml with hardware metadata"
|
||||||
|
- "Add networking.yaml with IP and Tailscale info"
|
||||||
|
- "Add capabilities.yaml with node capability description"
|
||||||
|
- "Add services.txt listing assigned services"
|
||||||
|
- "Update inventory/topology.yaml"
|
||||||
|
- "Commit and push changes to Forgejo"
|
||||||
|
|
||||||
|
- phase: 2. Bootstrapping (on the New Node)
|
||||||
|
steps:
|
||||||
|
- "Install OS (Debian/Ubuntu recommended)"
|
||||||
|
- "Configure SSH and user access"
|
||||||
|
- "Install Docker, Docker Compose, Tailscale, Git"
|
||||||
|
- "Join the tailnet"
|
||||||
|
- "Clone repository: git clone <forgejo-url>/homelab-codex.git ~/homelab-codex-ws"
|
||||||
|
- "Setup runtime: sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
|
||||||
|
|
||||||
|
- phase: 3. Initial Deployment
|
||||||
|
steps:
|
||||||
|
- "Run prepare: ~/homelab-codex-ws/scripts/deploy/deploy.sh prepare"
|
||||||
|
- "Run deploy: ~/homelab-codex-ws/scripts/deploy/deploy.sh deploy"
|
||||||
|
- "Run verify: ~/homelab-codex-ws/scripts/deploy/deploy.sh verify"
|
||||||
29
inventory/templates/node-bootstrap-checklist.yaml
Normal file
29
inventory/templates/node-bootstrap-checklist.yaml
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
---
|
||||||
|
bootstrap_checklist:
|
||||||
|
pre_flight:
|
||||||
|
- task: "Hardware connected and powered"
|
||||||
|
done: false
|
||||||
|
- task: "Base OS installed (Debian/Ubuntu)"
|
||||||
|
done: false
|
||||||
|
- task: "Network connectivity established"
|
||||||
|
done: false
|
||||||
|
- task: "SSH access configured"
|
||||||
|
done: false
|
||||||
|
onboarding:
|
||||||
|
- task: "Tailscale installed and authenticated"
|
||||||
|
done: false
|
||||||
|
- task: "Docker and Compose V2 installed"
|
||||||
|
done: false
|
||||||
|
- task: "Git installed"
|
||||||
|
done: false
|
||||||
|
- task: "Repository cloned to ~/homelab-codex-ws"
|
||||||
|
done: false
|
||||||
|
- task: "Opt homelab structure created"
|
||||||
|
done: false
|
||||||
|
initial_run:
|
||||||
|
- task: "deploy.sh prepare successful"
|
||||||
|
done: false
|
||||||
|
- task: "deploy.sh deploy successful"
|
||||||
|
done: false
|
||||||
|
- task: "deploy.sh verify successful"
|
||||||
|
done: false
|
||||||
18
inventory/templates/node-discovery-commands.yaml
Normal file
18
inventory/templates/node-discovery-commands.yaml
Normal file
|
|
@ -0,0 +1,18 @@
|
||||||
|
---
|
||||||
|
discovery_commands:
|
||||||
|
cpu:
|
||||||
|
- "lscpu"
|
||||||
|
- "cat /proc/cpuinfo"
|
||||||
|
memory:
|
||||||
|
- "free -h"
|
||||||
|
storage:
|
||||||
|
- "lsblk"
|
||||||
|
- "df -h"
|
||||||
|
network:
|
||||||
|
- "ip addr"
|
||||||
|
- "tailscale status"
|
||||||
|
gpu:
|
||||||
|
- "nvidia-smi"
|
||||||
|
- "lspci | grep -i vga"
|
||||||
|
usb:
|
||||||
|
- "lsusb"
|
||||||
13
inventory/templates/prepare-node.yaml
Normal file
13
inventory/templates/prepare-node.yaml
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
---
|
||||||
|
node_preparation:
|
||||||
|
actions:
|
||||||
|
- name: update_system
|
||||||
|
command: "sudo apt update && sudo apt upgrade -y"
|
||||||
|
- name: install_dependencies
|
||||||
|
command: "sudo apt install -y curl git docker.io docker-compose-v2 tailscale"
|
||||||
|
- name: configure_docker_permissions
|
||||||
|
command: "sudo usermod -aG docker $USER"
|
||||||
|
- name: create_runtime_directories
|
||||||
|
command: "sudo mkdir -p /opt/homelab/{services,config,state,logs} && sudo chown -R $USER:$USER /opt/homelab"
|
||||||
|
- name: initialize_repo
|
||||||
|
command: "git clone <repo_url> ~/homelab-codex-ws"
|
||||||
13
inventory/templates/prompts/create-node
Normal file
13
inventory/templates/prompts/create-node
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
### System Prompt Addendum: Create Node
|
||||||
|
|
||||||
|
**Context**: You are assisting in adding a new node to the homelab.
|
||||||
|
**Task**: Generate the necessary inventory files for a new node.
|
||||||
|
|
||||||
|
**Requirements**:
|
||||||
|
1. Ask for: hostname, IP address, Tailscale IP, hardware specs (CPU/RAM/Storage), and intended role/services.
|
||||||
|
2. Generate `hosts/<hostname>/host.yaml` and `hosts/<hostname>/networking.yaml`.
|
||||||
|
3. Provide a snippet for `inventory/topology.yaml`.
|
||||||
|
4. Recommend services based on hardware (e.g., if GPU is present, suggest inference services).
|
||||||
|
|
||||||
|
**Output Format**: YAML blocks for each file.
|
||||||
|
**Restriction**: Do NOT execute any shell commands. Only provide the configuration.
|
||||||
16
inventory/templates/prompts/deploy-node
Normal file
16
inventory/templates/prompts/deploy-node
Normal file
|
|
@ -0,0 +1,16 @@
|
||||||
|
### System Prompt Addendum: Deploy Node
|
||||||
|
|
||||||
|
**Context**: Orchestrating a deployment across one or more nodes.
|
||||||
|
**Task**: Generate the deployment plan and verification checklist.
|
||||||
|
|
||||||
|
**Requirements**:
|
||||||
|
1. Identify which nodes need updates based on git changes.
|
||||||
|
2. Recommend the sequence of stages (e.g., `prepare` on all, then `deploy` on edge nodes first).
|
||||||
|
3. Generate a human-readable checklist for the operator.
|
||||||
|
4. Define verification criteria for the `verify` stage.
|
||||||
|
|
||||||
|
**Output Format**:
|
||||||
|
- Deployment Plan (sequence of commands).
|
||||||
|
- Verification Checklist.
|
||||||
|
|
||||||
|
**Restriction**: Do NOT mutate infrastructure autonomously.
|
||||||
17
inventory/templates/prompts/recover-node
Normal file
17
inventory/templates/prompts/recover-node
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
### System Prompt Addendum: Recover Node
|
||||||
|
|
||||||
|
**Context**: A homelab node is unresponsive or has suffered data loss.
|
||||||
|
**Task**: Analyze logs and state to recommend recovery steps.
|
||||||
|
|
||||||
|
**Requirements**:
|
||||||
|
1. Request the content of `/opt/homelab/logs/deploy/` (latest log) and `/opt/homelab/state/deploy/current_stage`.
|
||||||
|
2. Analyze the last failed stage.
|
||||||
|
3. Recommend specific `deploy.sh` commands (e.g., `rollback` or `resume`).
|
||||||
|
4. Provide manual recovery steps if automated stages fail.
|
||||||
|
|
||||||
|
**Output Format**:
|
||||||
|
- Analysis of the failure.
|
||||||
|
- Recommended action.
|
||||||
|
- Documentation of the recovery process.
|
||||||
|
|
||||||
|
**Restriction**: Do NOT auto-execute deployment.
|
||||||
110
scripts/deploy/deploy.sh
Executable file
110
scripts/deploy/deploy.sh
Executable file
|
|
@ -0,0 +1,110 @@
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
# deploy.sh - Staged deployment framework for homelab nodes.
|
||||||
|
# Usage: ./deploy.sh [stage]
|
||||||
|
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# --- Configuration ---
|
||||||
|
RUNTIME_PATH="/opt/homelab"
|
||||||
|
STATE_DIR="${RUNTIME_PATH}/state/deploy"
|
||||||
|
LOG_DIR="${RUNTIME_PATH}/logs/deploy"
|
||||||
|
REPO_PATH="${HOME}/homelab-codex-ws"
|
||||||
|
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
|
||||||
|
LOG_FILE="${LOG_DIR}/deploy_${TIMESTAMP}.log"
|
||||||
|
|
||||||
|
# --- Initialization ---
|
||||||
|
mkdir -p "$STATE_DIR" "$LOG_DIR"
|
||||||
|
|
||||||
|
# Redirection for logging
|
||||||
|
exec > >(tee -a "$LOG_FILE") 2>&1
|
||||||
|
|
||||||
|
log() {
|
||||||
|
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
set_state() {
|
||||||
|
echo "$1" > "${STATE_DIR}/current_stage"
|
||||||
|
log "State set to: $1"
|
||||||
|
}
|
||||||
|
|
||||||
|
get_state() {
|
||||||
|
if [ -f "${STATE_DIR}/current_stage" ]; then
|
||||||
|
cat "${STATE_DIR}/current_stage"
|
||||||
|
else
|
||||||
|
echo "none"
|
||||||
|
fi
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- Stages ---
|
||||||
|
|
||||||
|
stage_prepare() {
|
||||||
|
log "Stage: PREPARE"
|
||||||
|
set_state "prepare"
|
||||||
|
# Skeleton: Pull latest changes, check dependencies, validate inventory
|
||||||
|
log "Checking repository at $REPO_PATH..."
|
||||||
|
cd "$REPO_PATH" && git pull
|
||||||
|
log "Preparation complete."
|
||||||
|
}
|
||||||
|
|
||||||
|
stage_deploy() {
|
||||||
|
log "Stage: DEPLOY"
|
||||||
|
set_state "deploy"
|
||||||
|
# Skeleton: Iterate through services and run docker compose
|
||||||
|
log "Deploying services defined for $(hostname)..."
|
||||||
|
# Implementation detail: loop through services/ and run compose
|
||||||
|
log "Deployment complete."
|
||||||
|
}
|
||||||
|
|
||||||
|
stage_verify() {
|
||||||
|
log "Stage: VERIFY"
|
||||||
|
set_state "verify"
|
||||||
|
# Skeleton: Check container status, healthchecks, connectivity
|
||||||
|
log "Verifying service health..."
|
||||||
|
docker ps
|
||||||
|
log "Verification complete."
|
||||||
|
}
|
||||||
|
|
||||||
|
stage_diagnose() {
|
||||||
|
log "Stage: DIAGNOSE"
|
||||||
|
# Skeleton: Check logs, resource usage, networking
|
||||||
|
log "Running diagnostics..."
|
||||||
|
docker stats --no-stream
|
||||||
|
log "Diagnostics complete."
|
||||||
|
}
|
||||||
|
|
||||||
|
stage_rollback() {
|
||||||
|
log "Stage: ROLLBACK"
|
||||||
|
# Skeleton: Revert to previous git commit or previous state
|
||||||
|
log "Rolling back changes..."
|
||||||
|
log "Rollback complete."
|
||||||
|
}
|
||||||
|
|
||||||
|
stage_resume() {
|
||||||
|
log "Stage: RESUME"
|
||||||
|
CURRENT=$(get_state)
|
||||||
|
log "Resuming from state: $CURRENT"
|
||||||
|
case "$CURRENT" in
|
||||||
|
"prepare") stage_deploy ;;
|
||||||
|
"deploy") stage_verify ;;
|
||||||
|
"verify") log "Last deployment was verified. Nothing to resume." ;;
|
||||||
|
*) log "Unknown state or nothing to resume. Starting from prepare..."; stage_prepare ;;
|
||||||
|
esac
|
||||||
|
}
|
||||||
|
|
||||||
|
# --- Main ---
|
||||||
|
|
||||||
|
COMMAND=${1:-resume}
|
||||||
|
|
||||||
|
log "--- Homelab Deployment Started (Command: $COMMAND) ---"
|
||||||
|
|
||||||
|
case "$COMMAND" in
|
||||||
|
prepare) stage_prepare ;;
|
||||||
|
deploy) stage_deploy ;;
|
||||||
|
verify) stage_verify ;;
|
||||||
|
diagnose) stage_diagnose ;;
|
||||||
|
rollback) stage_rollback ;;
|
||||||
|
resume) stage_resume ;;
|
||||||
|
*) echo "Usage: $0 {prepare|deploy|verify|diagnose|rollback|resume}"; exit 1 ;;
|
||||||
|
esac
|
||||||
|
|
||||||
|
log "--- Homelab Deployment Finished ---"
|
||||||
9
services/forgejo/README.md
Normal file
9
services/forgejo/README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
# Forgejo
|
||||||
|
|
||||||
|
Forgejo is a self-hosted lightweight software forge. Easy to install and low maintenance.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
Deployed on the `saturn` node as the git source of truth.
|
||||||
|
|
||||||
|
Web UI is available on port 3000.
|
||||||
|
SSH for git is available on port 222.
|
||||||
15
services/forgejo/docker-compose.yml
Normal file
15
services/forgejo/docker-compose.yml
Normal file
|
|
@ -0,0 +1,15 @@
|
||||||
|
services:
|
||||||
|
forgejo:
|
||||||
|
image: codeberg.org/forgejo/forgejo:latest
|
||||||
|
container_name: forgejo
|
||||||
|
restart: unless-stopped
|
||||||
|
environment:
|
||||||
|
- USER_UID=1000
|
||||||
|
- USER_GID=1000
|
||||||
|
volumes:
|
||||||
|
- /opt/homelab/data/forgejo/data:/data
|
||||||
|
- /etc/timezone:/etc/timezone:ro
|
||||||
|
- /etc/localtime:/etc/localtime:ro
|
||||||
|
ports:
|
||||||
|
- '3000:3000'
|
||||||
|
- '222:22'
|
||||||
3
services/forgejo/env.example
Normal file
3
services/forgejo/env.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
USER_UID=1000
|
||||||
|
USER_GID=1000
|
||||||
|
# FORGEJO__database__DB_TYPE=sqlite3
|
||||||
17
services/forgejo/healthcheck.sh
Normal file
17
services/forgejo/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Healthcheck for Forgejo
|
||||||
|
|
||||||
|
# Check if the container is running
|
||||||
|
if ! docker ps --filter "name=forgejo" --filter "status=running" | grep -q "forgejo"; then
|
||||||
|
echo "[FAIL] Forgejo container is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check API health endpoint
|
||||||
|
if ! curl -sf http://localhost:3000/api/healthz > /dev/null; then
|
||||||
|
echo "[FAIL] Forgejo API is not responding"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[OK] Forgejo is healthy"
|
||||||
|
exit 0
|
||||||
28
services/forgejo/service.yaml
Normal file
28
services/forgejo/service.yaml
Normal file
|
|
@ -0,0 +1,28 @@
|
||||||
|
service:
|
||||||
|
name: forgejo
|
||||||
|
owner_node: saturn
|
||||||
|
exposure: private
|
||||||
|
dependencies: []
|
||||||
|
ports:
|
||||||
|
- container: 3000
|
||||||
|
host: 3000
|
||||||
|
protocol: tcp
|
||||||
|
- container: 22
|
||||||
|
host: 222
|
||||||
|
protocol: tcp
|
||||||
|
healthcheck:
|
||||||
|
type: http
|
||||||
|
endpoint: http://localhost:3000/api/healthz
|
||||||
|
interval: 1m
|
||||||
|
timeout: 10s
|
||||||
|
retries: 5
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
persistence:
|
||||||
|
paths:
|
||||||
|
- /opt/homelab/data/forgejo/data
|
||||||
|
runtime:
|
||||||
|
directories:
|
||||||
|
- /opt/homelab/data/forgejo/data
|
||||||
|
env_vars:
|
||||||
|
- USER_UID
|
||||||
|
- USER_GID
|
||||||
9
services/mosquitto/README.md
Normal file
9
services/mosquitto/README.md
Normal file
|
|
@ -0,0 +1,9 @@
|
||||||
|
# Mosquitto MQTT Broker
|
||||||
|
|
||||||
|
Eclipse Mosquitto is an open source (EPL/EDL licensed) message broker that implements the MQTT protocol versions 5.0, 3.1.1 and 3.1.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
Deployed on the `piha` node.
|
||||||
|
|
||||||
|
Port 1883 for standard MQTT.
|
||||||
|
Port 9001 for WebSockets.
|
||||||
12
services/mosquitto/docker-compose.yml
Normal file
12
services/mosquitto/docker-compose.yml
Normal file
|
|
@ -0,0 +1,12 @@
|
||||||
|
services:
|
||||||
|
mosquitto:
|
||||||
|
image: eclipse-mosquitto:latest
|
||||||
|
container_name: mosquitto
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- '1883:1883'
|
||||||
|
- '9001:9001'
|
||||||
|
volumes:
|
||||||
|
- /opt/homelab/data/mosquitto/config:/mosquitto/config
|
||||||
|
- /opt/homelab/data/mosquitto/data:/mosquitto/data
|
||||||
|
- /opt/homelab/data/mosquitto/log:/mosquitto/log
|
||||||
2
services/mosquitto/env.example
Normal file
2
services/mosquitto/env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
# No specific environment variables required by default.
|
||||||
|
# Mosquitto is mainly configured via /opt/homelab/data/mosquitto/config/mosquitto.conf
|
||||||
17
services/mosquitto/healthcheck.sh
Normal file
17
services/mosquitto/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Healthcheck for Mosquitto
|
||||||
|
|
||||||
|
# Check if the container is running
|
||||||
|
if ! docker ps --filter "name=mosquitto" --filter "status=running" | grep -q "mosquitto"; then
|
||||||
|
echo "[FAIL] Mosquitto container is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Basic port check for 1883
|
||||||
|
if ! (echo > /dev/tcp/localhost/1883) >/dev/null 2>&1; then
|
||||||
|
echo "[FAIL] Mosquitto port 1883 is not reachable"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[OK] Mosquitto is healthy"
|
||||||
|
exit 0
|
||||||
29
services/mosquitto/service.yaml
Normal file
29
services/mosquitto/service.yaml
Normal file
|
|
@ -0,0 +1,29 @@
|
||||||
|
service:
|
||||||
|
name: mosquitto
|
||||||
|
owner_node: piha
|
||||||
|
exposure: private
|
||||||
|
dependencies: []
|
||||||
|
ports:
|
||||||
|
- container: 1883
|
||||||
|
host: 1883
|
||||||
|
protocol: tcp
|
||||||
|
- container: 9001
|
||||||
|
host: 9001
|
||||||
|
protocol: tcp
|
||||||
|
healthcheck:
|
||||||
|
type: container
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
persistence:
|
||||||
|
paths:
|
||||||
|
- /opt/homelab/data/mosquitto/config
|
||||||
|
- /opt/homelab/data/mosquitto/data
|
||||||
|
- /opt/homelab/data/mosquitto/log
|
||||||
|
runtime:
|
||||||
|
directories:
|
||||||
|
- /opt/homelab/data/mosquitto/config
|
||||||
|
- /opt/homelab/data/mosquitto/data
|
||||||
|
- /opt/homelab/data/mosquitto/log
|
||||||
|
env_vars: []
|
||||||
13
services/npm/README.md
Normal file
13
services/npm/README.md
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
# Nginx Proxy Manager (NPM)
|
||||||
|
|
||||||
|
Expose your services easily and securely with Nginx Proxy Manager.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
- Secure HTTPS via Let's Encrypt
|
||||||
|
- Easy to use Web UI
|
||||||
|
- Advanced configuration for power users
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
Deployed on the `vps` node for public ingress.
|
||||||
|
|
||||||
|
Web UI is available on port 81.
|
||||||
2
services/npm/env.example
Normal file
2
services/npm/env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
# No environment variables required for standard NPM deployment.
|
||||||
|
# Local overrides can be placed in /opt/homelab/config/npm/.env
|
||||||
17
services/npm/healthcheck.sh
Normal file
17
services/npm/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Healthcheck for Nginx Proxy Manager
|
||||||
|
|
||||||
|
# Check if the container is running
|
||||||
|
if ! docker ps --filter "name=npm" --filter "status=running" | grep -q "npm"; then
|
||||||
|
echo "[FAIL] NPM container is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check Web UI responsiveness (port 81)
|
||||||
|
if ! curl -sf http://localhost:81 > /dev/null; then
|
||||||
|
echo "[FAIL] NPM Web UI is not responding"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[OK] NPM is healthy"
|
||||||
|
exit 0
|
||||||
31
services/npm/service.yaml
Normal file
31
services/npm/service.yaml
Normal file
|
|
@ -0,0 +1,31 @@
|
||||||
|
service:
|
||||||
|
name: npm
|
||||||
|
owner_node: vps
|
||||||
|
exposure: public
|
||||||
|
dependencies: []
|
||||||
|
ports:
|
||||||
|
- container: 80
|
||||||
|
host: 80
|
||||||
|
protocol: tcp
|
||||||
|
- container: 81
|
||||||
|
host: 81
|
||||||
|
protocol: tcp
|
||||||
|
- container: 443
|
||||||
|
host: 443
|
||||||
|
protocol: tcp
|
||||||
|
healthcheck:
|
||||||
|
type: http
|
||||||
|
endpoint: http://localhost:81
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
persistence:
|
||||||
|
paths:
|
||||||
|
- /opt/homelab/data/npm/data
|
||||||
|
- /opt/homelab/data/npm/letsencrypt
|
||||||
|
runtime:
|
||||||
|
directories:
|
||||||
|
- /opt/homelab/data/npm/data
|
||||||
|
- /opt/homelab/data/npm/letsencrypt
|
||||||
|
env_vars: []
|
||||||
13
services/ollama/README.md
Normal file
13
services/ollama/README.md
Normal file
|
|
@ -0,0 +1,13 @@
|
||||||
|
# Ollama
|
||||||
|
|
||||||
|
Get up and running with large language models locally.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
Deployed on the `solaria` node for GPU acceleration.
|
||||||
|
|
||||||
|
API is available on port 11434.
|
||||||
|
|
||||||
|
Example check:
|
||||||
|
```bash
|
||||||
|
curl http://localhost:11434/api/tags
|
||||||
|
```
|
||||||
16
services/ollama/docker-compose.yml
Normal file
16
services/ollama/docker-compose.yml
Normal file
|
|
@ -0,0 +1,16 @@
|
||||||
|
services:
|
||||||
|
ollama:
|
||||||
|
image: ollama/ollama:latest
|
||||||
|
container_name: ollama
|
||||||
|
restart: unless-stopped
|
||||||
|
ports:
|
||||||
|
- '11434:11434'
|
||||||
|
volumes:
|
||||||
|
- /opt/homelab/data/ollama:/root/.ollama
|
||||||
|
deploy:
|
||||||
|
resources:
|
||||||
|
reservations:
|
||||||
|
devices:
|
||||||
|
- driver: nvidia
|
||||||
|
count: all
|
||||||
|
capabilities: [gpu]
|
||||||
2
services/ollama/env.example
Normal file
2
services/ollama/env.example
Normal file
|
|
@ -0,0 +1,2 @@
|
||||||
|
# No specific environment variables required by default.
|
||||||
|
# CUDA_VISIBLE_DEVICES=0
|
||||||
17
services/ollama/healthcheck.sh
Normal file
17
services/ollama/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Healthcheck for Ollama
|
||||||
|
|
||||||
|
# Check if the container is running
|
||||||
|
if ! docker ps --filter "name=ollama" --filter "status=running" | grep -q "ollama"; then
|
||||||
|
echo "[FAIL] Ollama container is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check API responsiveness
|
||||||
|
if ! curl -sf http://localhost:11434/api/tags > /dev/null; then
|
||||||
|
echo "[FAIL] Ollama API is not responding"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[OK] Ollama is healthy"
|
||||||
|
exit 0
|
||||||
23
services/ollama/service.yaml
Normal file
23
services/ollama/service.yaml
Normal file
|
|
@ -0,0 +1,23 @@
|
||||||
|
service:
|
||||||
|
name: ollama
|
||||||
|
owner_node: solaria
|
||||||
|
exposure: private
|
||||||
|
dependencies: []
|
||||||
|
ports:
|
||||||
|
- container: 11434
|
||||||
|
host: 11434
|
||||||
|
protocol: tcp
|
||||||
|
healthcheck:
|
||||||
|
type: http
|
||||||
|
endpoint: http://localhost:11434/api/tags
|
||||||
|
interval: 1m
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
persistence:
|
||||||
|
paths:
|
||||||
|
- /opt/homelab/data/ollama
|
||||||
|
runtime:
|
||||||
|
directories:
|
||||||
|
- /opt/homelab/data/ollama
|
||||||
|
env_vars: []
|
||||||
10
services/zigbee2mqtt/README.md
Normal file
10
services/zigbee2mqtt/README.md
Normal file
|
|
@ -0,0 +1,10 @@
|
||||||
|
# Zigbee2MQTT
|
||||||
|
|
||||||
|
Zigbee to MQTT bridge, get rid of your proprietary Zigbee bridges.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
Deployed on the `piha` node.
|
||||||
|
|
||||||
|
Requires a Zigbee adapter (e.g., Sonoff ZBDongle-E) mapped to `/dev/ttyACM0`.
|
||||||
|
|
||||||
|
Frontend is available on port 8080.
|
||||||
14
services/zigbee2mqtt/docker-compose.yml
Normal file
14
services/zigbee2mqtt/docker-compose.yml
Normal file
|
|
@ -0,0 +1,14 @@
|
||||||
|
services:
|
||||||
|
zigbee2mqtt:
|
||||||
|
container_name: zigbee2mqtt
|
||||||
|
image: koenkk/zigbee2mqtt:latest
|
||||||
|
restart: unless-stopped
|
||||||
|
volumes:
|
||||||
|
- /opt/homelab/data/zigbee2mqtt/data:/app/data
|
||||||
|
- /run/udev:/run/udev:ro
|
||||||
|
ports:
|
||||||
|
- 8080:8080
|
||||||
|
devices:
|
||||||
|
- /dev/ttyACM0:/dev/ttyACM0
|
||||||
|
environment:
|
||||||
|
- TZ=Europe/Stockholm
|
||||||
3
services/zigbee2mqtt/env.example
Normal file
3
services/zigbee2mqtt/env.example
Normal file
|
|
@ -0,0 +1,3 @@
|
||||||
|
TZ=Europe/Stockholm
|
||||||
|
# MQTT credentials if applicable
|
||||||
|
# Z2M_MQTT_SERVER=mqtt://mosquitto:1883
|
||||||
17
services/zigbee2mqtt/healthcheck.sh
Normal file
17
services/zigbee2mqtt/healthcheck.sh
Normal file
|
|
@ -0,0 +1,17 @@
|
||||||
|
#!/bin/bash
|
||||||
|
# Healthcheck for Zigbee2MQTT
|
||||||
|
|
||||||
|
# Check if the container is running
|
||||||
|
if ! docker ps --filter "name=zigbee2mqtt" --filter "status=running" | grep -q "zigbee2mqtt"; then
|
||||||
|
echo "[FAIL] Zigbee2MQTT container is not running"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
# Check frontend responsiveness
|
||||||
|
if ! curl -sf http://localhost:8080 > /dev/null; then
|
||||||
|
echo "[FAIL] Zigbee2MQTT frontend is not responding"
|
||||||
|
exit 1
|
||||||
|
fi
|
||||||
|
|
||||||
|
echo "[OK] Zigbee2MQTT is healthy"
|
||||||
|
exit 0
|
||||||
25
services/zigbee2mqtt/service.yaml
Normal file
25
services/zigbee2mqtt/service.yaml
Normal file
|
|
@ -0,0 +1,25 @@
|
||||||
|
service:
|
||||||
|
name: zigbee2mqtt
|
||||||
|
owner_node: piha
|
||||||
|
exposure: private
|
||||||
|
dependencies:
|
||||||
|
- mosquitto
|
||||||
|
ports:
|
||||||
|
- container: 8080
|
||||||
|
host: 8080
|
||||||
|
protocol: tcp
|
||||||
|
healthcheck:
|
||||||
|
type: http
|
||||||
|
endpoint: http://localhost:8080
|
||||||
|
interval: 30s
|
||||||
|
timeout: 10s
|
||||||
|
retries: 3
|
||||||
|
restart_policy: unless-stopped
|
||||||
|
persistence:
|
||||||
|
paths:
|
||||||
|
- /opt/homelab/data/zigbee2mqtt/data
|
||||||
|
runtime:
|
||||||
|
directories:
|
||||||
|
- /opt/homelab/data/zigbee2mqtt/data
|
||||||
|
env_vars:
|
||||||
|
- TZ
|
||||||
Loading…
Reference in a new issue