homelab-codex-ws/docs/lifecycle.md

52 lines
2 KiB
Markdown

# Service Lifecycle and Recovery
This document defines the lifecycle of a service in the homelab and the procedures for operational recovery.
## Service Lifecycle
1. **Onboarding**:
- Create `services/<service>/` directory.
- Define `docker-compose.yml`, `service.yaml`, `README.md`, `env.example`, and `healthcheck.sh`.
- Register service in `inventory/topology.yaml` or relevant host configs.
2. **Provisioning**:
- Ensure `/opt/homelab/data/<service>` exists.
- Ensure `/opt/homelab/config/<service>` exists and contains required secrets/configs.
- Setup environment variables from `env.example` into `/opt/homelab/config/<service>/.env`.
3. **Deployment**:
- `scripts/deploy/deploy.sh prepare`
- `scripts/deploy/deploy.sh deploy`
4. **Verification**:
- `scripts/deploy/deploy.sh verify`
- Healthchecks are automated within the verify stage.
5. **Maintenance**:
- Periodic updates via `docker compose pull`.
- Log monitoring via `docker compose logs -f`.
6. **Decommissioning**:
- `docker compose down`.
- Archive `/opt/homelab/data/<service>` if necessary.
## Operational Recovery
### 1. Container Failure
If a service is unhealthy:
- Check `docker compose logs`.
- Restart: `docker compose restart`.
- Recreate: `docker compose up -d --force-recreate`.
### 2. Node Failure
If a host node fails:
- Services with `owner_node` matching the failed node must be recovered on a backup node or the node must be restored.
- Persistence data must be restored from backups to `/opt/homelab/data/<service>`.
### 3. Dependency Recovery
If a dependency fails:
- Services depending on it might report unhealthy status.
- Recover the dependency first.
- Re-verify dependent services.
## Persistent Data Conventions
- **Data**: `/opt/homelab/data/<service>` - Primary persistent state.
- **Config**: `/opt/homelab/config/<service>` - Local overrides and secrets.
- **Backups**: Standard backup routines should target `/opt/homelab/data/`.