diff --git a/README.md b/README.md index b32bc90..8007670 100644 --- a/README.md +++ b/README.md @@ -29,10 +29,13 @@ The homelab consists of several nodes connected via a Tailscale internal mesh. ## Documentation Index - [Infrastructure Standards](docs/standards.md) +- [Agent Operating Procedures](docs/agents.md) (For AI/Non-Human Agents) - [Deployment Conventions](docs/deployment.md) - [Hardware](docs/hardware.md) - [Networking](docs/networking.md) - [Services](docs/services.md) +- [Node Capabilities](docs/capabilities.md) +- [Action Model](services/agent-system/action-model.md) --- *Note: This repository documents the state of the homelab. Runtime state lives outside the repository in `/opt/homelab`.* diff --git a/docs/agents.md b/docs/agents.md new file mode 100644 index 0000000..8861c31 --- /dev/null +++ b/docs/agents.md @@ -0,0 +1,49 @@ +# Agent Operating Procedures + +This document defines the operating procedures, constraints, and interaction protocols for non-human agents (AI agents, autonomous scripts) within the Homelab Codex ecosystem. + +## 1. Core Principles for Agents + +1. **Read-Only by Default**: Agents should assume read-only access to the `/opt/homelab` runtime unless explicitly executing an approved action. +2. **Git as Authority**: The repository on **SATURN** is the source of truth. Agents must not modify the runtime state on nodes directly without corresponding (or pending) Git state, unless it's an emergency mitigation. +3. **Human-in-the-Loop (HIL)**: All destructive or structural changes (restarts, deployments, config changes) must follow the [Action Approval Model](../services/agent-system/action-model.md). +4. **Idempotency**: All scripts and actions proposed or executed by agents MUST be idempotent. +5. **Context-Awareness**: Agents MUST read the `README.md` and `docs/agents.md` at the start of every session to align with current infrastructure standards. + +## 2. Agent Roles + +| Role | Responsibility | Scope | +|------|----------------|-------| +| **Observer** | Monitors health, logs, and events. | Read-only access to `/opt/homelab/events` and `logs`. | +| **Stability Agent** | Local node watchdog, event emitter. | Local node runtime, `service.yaml` healthchecks. | +| **Orchestrator** | High-level planning, workload placement. | Repository-wide, multi-node topology. | +| **Materializer** | Translates high-level intent into Docker/System state. | Execution of `approved` actions. | + +## 3. Discovery Protocol + +Agents must use the following entry points to understand the system: + +1. **Topology**: `inventory/topology.yaml` for node list and roles. +2. **Capabilities**: `hosts//capabilities.yaml` to understand hardware/software constraints. +3. **Service Contract**: `services//service.yaml` to understand how to check health and manage a service. +4. **Operational State**: `/opt/homelab/state/` on local nodes for real-time status. + +## 4. Interaction with Humans + +Agents communicate with the operator via the `agent-system/telegram-bot`. + +- **Alerting**: Agents emit events to the event system. Critical events are forwarded to Telegram. +- **Proposals**: When an agent identifies a need for change (e.g., "Service X is failing, suggest restart"), it creates a `pending` action in `/opt/homelab/actions/pending/`. +- **Approval**: Agents must wait for the action status to transition to `approved` before execution. + +## 5. Decision Logic (Reasoning) + +When making decisions, agents MUST prioritize: +1. **Safety**: Do not violate power constraints (see `capabilities.yaml`). +2. **Stability**: Prefer keeping services on their `owner_node` unless it's down. +3. **Connectivity**: On intermittent nodes (CHELSTY), avoid actions requiring heavy WAN traffic during low-signal periods. + +## 6. Access Control for Agents + +- **Filesystem**: Agents should run as the `homelab` user or equivalent with restricted sudo access to `docker compose`. +- **Secrets**: Agents MUST NOT attempt to read `.env` files unless specifically tasked with credential rotation. They should treat secrets as opaque handles. diff --git a/docs/capabilities.md b/docs/capabilities.md index 46fb283..3c56b4b 100644 --- a/docs/capabilities.md +++ b/docs/capabilities.md @@ -83,3 +83,10 @@ Future autonomous agents will use this metadata to: 2. **Generate Plans:** Create step-by-step deployment or migration plans based on hardware compatibility. 3. **Validate Topology:** Ensure that a proposed multi-node setup doesn't violate networking or operational constraints (e.g., don't put a DB on an intermittent node). 4. **Propose Failover:** Automatically suggest the best alternative node during an outage. + +## Agent Reasoning Logic + +When an agent parses `capabilities.yaml`, it should apply these heuristics: +- **Intermittent Connectivity**: If `operational.connectivity == "intermittent"`, do not schedule high-bandwidth syncs or critical cloud-dependent services. +- **Power Constraints**: If `operational.power_constraint == "low-power"`, avoid heavy LLM inference or continuous high-CPU tasks. +- **Availability Target**: If `availability_target == "high"`, this node is a candidate for hosting control-plane failovers. diff --git a/docs/standards.md b/docs/standards.md index 5cd8304..69e9318 100644 --- a/docs/standards.md +++ b/docs/standards.md @@ -49,9 +49,10 @@ Runtime state must live outside the repository to keep it immutable and clean. ## Service Standards 1. **Normalization**: Every service MUST follow the `services//` layout. -2. **Metadata**: Every service MUST have a `service.yaml` defining its operational contract. -3. **Healthchecks**: Every service MUST have a `healthcheck.sh` for verification. -4. **Secrets**: NEVER commit secrets to Git. Use `env.example` as a template and populate `/opt/homelab/config//.env` on the host. +2. **Metadata**: Every service MUST have a `service.yaml` defining its operational contract. This is the primary source of truth for AI agents. +3. **Healthchecks**: Every service MUST have a `healthcheck.sh` for verification. Agents use this to emit stability events. +4. **Actionability**: Any automated recovery action proposed by an agent must be backed by a `service.yaml` definition. +5. **Secrets**: NEVER commit secrets to Git. Use `env.example` as a template and populate `/opt/homelab/config//.env` on the host. Agents must treat these as "black box" configurations. ## Docker Compose Standards diff --git a/hosts/vps/services.yaml b/hosts/vps/services.yaml index 7115a92..f9d391d 100644 --- a/hosts/vps/services.yaml +++ b/hosts/vps/services.yaml @@ -13,3 +13,22 @@ services: config_path: /opt/homelab/config/stability-agent data_path: /opt/homelab/state logs_path: /opt/homelab/events + + control-plane: + role: management-and-orchestration + deployment_model: docker-compose + exposure: tailscale-internal + offline_required: false + depends_on: + local: + - stability-agent + external: + - piha:agent-system-redis + ports: + - name: http + container_port: 18180 + protocol: tcp + runtime: + config_path: /opt/homelab/config/control-plane + data_path: /opt/homelab/data/control-plane + logs_path: /opt/homelab/logs/control-plane diff --git a/services/agent-system/action-model.md b/services/agent-system/action-model.md index 0930b34..29c2253 100644 --- a/services/agent-system/action-model.md +++ b/services/agent-system/action-model.md @@ -3,13 +3,20 @@ Actions are JSON files stored in `/opt/homelab/actions/{status}/{action_id}.json`. #### Statuses -- `pending`: Waiting for operator approval. +- `pending`: Waiting for operator approval. AI agents create actions in this state. - `approved`: Approved by operator, ready for execution. - `rejected`: Rejected by operator, will not be executed. -- `running`: Currently being executed by an agent. +- `running`: Currently being executed by an agent (e.g. `materializer`). - `completed`: Successfully executed. - `failed`: Execution failed. +#### Human-in-the-Loop (HIL) Protocol +1. **Request**: Agent identifies a required change and writes a JSON to `actions/pending/`. +2. **Notification**: System notifies the human operator. +3. **Audit**: Human reviews `details.reason` and `details.diff`. +4. **Authorization**: Human moves file to `approved/`. +5. **Execution**: Agent monitors `approved/` and executes the task. + #### Schema ```json {