homelab-codex-ws/docs/agents.md

# Agent Operating Procedures

This document defines the operating procedures, constraints, and interaction protocols for non-human agents (AI agents, autonomous scripts) within the Homelab Codex ecosystem.

## 1. Core Principles for Agents

1.  **Read-Only by Default**: Agents should assume read-only access to the `/opt/homelab` runtime unless explicitly executing an approved action.
2.  **Git as Authority**: The repository on **SATURN** is the source of truth. Agents must not modify the runtime state on nodes directly without corresponding (or pending) Git state, unless it's an emergency mitigation.
3.  **Human-in-the-Loop (HIL)**: All destructive or structural changes (restarts, deployments, config changes) must follow the [Action Approval Model](../services/agent-system/action-model.md).
4.  **Idempotency**: All scripts and actions proposed or executed by agents MUST be idempotent.
5.  **Context-Awareness**: Agents MUST read the `README.md` and `docs/agents.md` at the start of every session to align with current infrastructure standards.

## 2. Agent Roles

| Role | Responsibility | Scope |
|------|----------------|-------|
| **Observer** | Monitors health, logs, and events. | Read-only access to `/opt/homelab/events` and `logs`. |
| **Stability Agent** | Local node watchdog, event emitter. | Local node runtime, `service.yaml` healthchecks. |
| **Orchestrator** | High-level planning, workload placement. | Repository-wide, multi-node topology. |
| **Materializer** | Translates high-level intent into Docker/System state. | Execution of `approved` actions. |

## 3. Discovery Protocol

Agents must use the following entry points to understand the system:

1.  **Topology**: `inventory/topology.yaml` for node list and roles.
2.  **Capabilities**: `hosts/<node>/capabilities.yaml` to understand hardware/software constraints.
3.  **Service Contract**: `services/<service>/service.yaml` to understand how to check health and manage a service.
4.  **Operational State**: `/opt/homelab/state/` on local nodes for real-time status.

## 4. Interaction with Humans

Agents communicate with the operator via the `agent-system/telegram-bot`. 

- **Alerting**: Agents emit events to the event system. Critical events are forwarded to Telegram.
- **Proposals**: When an agent identifies a need for change (e.g., "Service X is failing, suggest restart"), it creates a `pending` action in `/opt/homelab/actions/pending/`.
- **Approval**: Agents must wait for the action status to transition to `approved` before execution.

## 5. Decision Logic (Reasoning)

When making decisions, agents MUST prioritize:
1.  **Safety**: Do not violate power constraints (see `capabilities.yaml`).
2.  **Stability**: Prefer keeping services on their `owner_node` unless it's down.
3.  **Connectivity**: On intermittent nodes (CHELSTY), avoid actions requiring heavy WAN traffic during low-signal periods.

## 6. Access Control for Agents

- **Filesystem**: Agents should run as the `homelab` user or equivalent with restricted sudo access to `docker compose`.
- **Secrets**: Agents MUST NOT attempt to read `.env` files unless specifically tasked with credential rotation. They should treat secrets as opaque handles.
docs: uzupelnij dokumentacje pod katem agentow AI Co-authored-by: Junie <junie@jetbrains.com> 2026-05-20 12:06:23 +02:00			`# Agent Operating Procedures`

			`This document defines the operating procedures, constraints, and interaction protocols for non-human agents (AI agents, autonomous scripts) within the Homelab Codex ecosystem.`

			`## 1. Core Principles for Agents`

			1. Read-Only by Default: Agents should assume read-only access to the `/opt/homelab` runtime unless explicitly executing an approved action.
			`2. Git as Authority: The repository on SATURN is the source of truth. Agents must not modify the runtime state on nodes directly without corresponding (or pending) Git state, unless it's an emergency mitigation.`
			`3. Human-in-the-Loop (HIL): All destructive or structural changes (restarts, deployments, config changes) must follow the [Action Approval Model](../services/agent-system/action-model.md).`
			`4. Idempotency: All scripts and actions proposed or executed by agents MUST be idempotent.`
			5. Context-Awareness: Agents MUST read the `README.md` and `docs/agents.md` at the start of every session to align with current infrastructure standards.

			`## 2. Agent Roles`

			`\| Role \| Responsibility \| Scope \|`
			`\|------\|----------------\|-------\|`
			\| Observer \| Monitors health, logs, and events. \| Read-only access to `/opt/homelab/events` and `logs`. \|
			\| Stability Agent \| Local node watchdog, event emitter. \| Local node runtime, `service.yaml` healthchecks. \|
			`\| Orchestrator \| High-level planning, workload placement. \| Repository-wide, multi-node topology. \|`
			\| Materializer \| Translates high-level intent into Docker/System state. \| Execution of `approved` actions. \|

			`## 3. Discovery Protocol`

			`Agents must use the following entry points to understand the system:`

			1. Topology: `inventory/topology.yaml` for node list and roles.
			2. Capabilities: `hosts/<node>/capabilities.yaml` to understand hardware/software constraints.
			3. Service Contract: `services/<service>/service.yaml` to understand how to check health and manage a service.
			4. Operational State: `/opt/homelab/state/` on local nodes for real-time status.

			`## 4. Interaction with Humans`

			Agents communicate with the operator via the `agent-system/telegram-bot`.

			`- Alerting: Agents emit events to the event system. Critical events are forwarded to Telegram.`
			- Proposals: When an agent identifies a need for change (e.g., "Service X is failing, suggest restart"), it creates a `pending` action in `/opt/homelab/actions/pending/`.
			- Approval: Agents must wait for the action status to transition to `approved` before execution.

			`## 5. Decision Logic (Reasoning)`

			`When making decisions, agents MUST prioritize:`
			1. Safety: Do not violate power constraints (see `capabilities.yaml`).
			2. Stability: Prefer keeping services on their `owner_node` unless it's down.
			`3. Connectivity: On intermittent nodes (CHELSTY), avoid actions requiring heavy WAN traffic during low-signal periods.`

			`## 6. Access Control for Agents`

			- Filesystem: Agents should run as the `homelab` user or equivalent with restricted sudo access to `docker compose`.
			- Secrets: Agents MUST NOT attempt to read `.env` files unless specifically tasked with credential rotation. They should treat secrets as opaque handles.