homelab-codex-ws/docs/lifecycle.md

1.9 KiB

Service Lifecycle and Recovery

This document defines the lifecycle of a service in the homelab and the procedures for operational recovery.

Service Lifecycle

  1. Onboarding:
    • Create services/<service>/ directory.
    • Define docker-compose.yml, service.yaml, README.md, env.example, and healthcheck.sh.
    • Register service in inventory/topology.yaml or relevant host configs.
  2. Provisioning:
    • Ensure /opt/homelab/data/<service> exists.
    • Ensure /opt/homelab/config/<service> exists and contains required secrets/configs.
    • Setup environment variables from env.example into /opt/homelab/config/<service>/.env.
  3. Deployment:
    • docker compose pull
    • docker compose up -d
  4. Verification:
    • Run healthcheck.sh.
    • Verify ports are reachable according to service.yaml.
  5. Maintenance:
    • Periodic updates via docker compose pull.
    • Log monitoring via docker compose logs -f.
  6. Decommissioning:
    • docker compose down.
    • Archive /opt/homelab/data/<service> if necessary.

Operational Recovery

1. Container Failure

If a service is unhealthy:

  • Check docker compose logs.
  • Restart: docker compose restart.
  • Recreate: docker compose up -d --force-recreate.

2. Node Failure

If a host node fails:

  • Services with owner_node matching the failed node must be recovered on a backup node or the node must be restored.
  • Persistence data must be restored from backups to /opt/homelab/data/<service>.

3. Dependency Recovery

If a dependency fails:

  • Services depending on it might report unhealthy status.
  • Recover the dependency first.
  • Re-verify dependent services.

Persistent Data Conventions

  • Data: /opt/homelab/data/<service> - Primary persistent state.
  • Config: /opt/homelab/config/<service> - Local overrides and secrets.
  • Backups: Standard backup routines should target /opt/homelab/data/.