agent-system/docs/action-queue-system.md

# Action Queue System

The Action Queue System provides a safe, filesystem-first lifecycle for operational actions in the homelab platform. It enables controlled execution with mandatory approval for high-risk operations.

## Action Lifecycle

Actions move through various states, represented by directories under `/opt/homelab/actions/`:

1.  **Pending** (`pending/`): Actions proposed by the Supervisor or other agents.
2.  **Approved** (`approved/`): Actions that have been reviewed and approved for execution.
3.  **Running** (`running/`): Actions currently being processed by the Executor.
4.  **Completed** (`completed/`): Successfully executed actions.
5.  **Failed** (`failed/`): Actions that encountered errors during execution.
6.  **Rejected** (`rejected/`): Proposed actions that were explicitly denied.

## Action Schema

Actions are stored as JSON documents with the following structure:

```json
{
  "action_id": "uuid",
  "created_at": 1620000000.0,
  "proposed_by": "supervisor",
  "correlation_id": "uuid",
  "node": "node-name",
  "service": "service-name",
  "action_type": "redeploy_service",
  "risk_level": "guarded",
  "confidence": 0.9,
  "approval_required": true,
  "autonomous_eligible": false,
  "status": "pending",
  "payload": { ... },
  "rollback_reference": null
}
```

## Safety Model

Actions are categorized into safety classes:

-   **Safe**: Low-risk actions that may be eligible for autonomous execution in the future (e.g., `collect_diagnostics`, `rerun_healthcheck`).
-   **Guarded**: Actions that default to requiring approval but could be automated under strict conditions (e.g., `redeploy_service`, `rerun_deployment_stage`).
-   **Dangerous**: High-risk actions that ALWAYS require manual approval.

Currently, the platform operates in a **Recommendation-Only** mode where even `safe` actions require explicit approval.

## Initial Action Types

-   `redeploy_service`: Restarts or redeploys a service container.
-   `rerun_healthcheck`: Triggers an immediate health check.
-   `rerun_deployment_stage`: Retries a specific stage of a failed deployment.
-   `collect_diagnostics`: Gathers logs and metrics for troubleshooting.

## Executor

The Executor (`scripts/executor/executor.py`) is responsible for processing approved actions. It features:

-   **Process Approved Only**: Only actions in the `approved/` directory are processed.
-   **Recommendation-Safe**: Simulation-based execution that logs intended mutations without side effects.
-   **Idempotency**: Designed to be safe to run multiple times.
-   **Resumable State**: If interrupted, it will pick up actions in the `running/` state.
-   **Append-Only History**: Maintains a `history.log` of all action transitions.

## Rollback Concepts

Every action schema includes a `rollback_reference`. In future iterations, this will point to the previous stable state or a reverse action that can be triggered if the current action fails or causes further instability.

## Future Autonomous Execution

The system is designed to transition to autonomous execution by:
1.  Identifying `safe` actions with high `confidence` scores.
2.  Matching them against a `policy-engine`.
3.  Automatically moving them from `pending/` to `approved/` based on allowed safety guardrails.
Resolve merge conflicts 2026-05-12 18:01:37 +02:00			`# Action Queue System`

			`The Action Queue System provides a safe, filesystem-first lifecycle for operational actions in the homelab platform. It enables controlled execution with mandatory approval for high-risk operations.`

			`## Action Lifecycle`

			Actions move through various states, represented by directories under `/opt/homelab/actions/`:

			1. Pending (`pending/`): Actions proposed by the Supervisor or other agents.
			2. Approved (`approved/`): Actions that have been reviewed and approved for execution.
			3. Running (`running/`): Actions currently being processed by the Executor.
			4. Completed (`completed/`): Successfully executed actions.
			5. Failed (`failed/`): Actions that encountered errors during execution.
			6. Rejected (`rejected/`): Proposed actions that were explicitly denied.

			`## Action Schema`

			`Actions are stored as JSON documents with the following structure:`

			```json
			`{`
			`"action_id": "uuid",`
			`"created_at": 1620000000.0,`
			`"proposed_by": "supervisor",`
			`"correlation_id": "uuid",`
			`"node": "node-name",`
			`"service": "service-name",`
			`"action_type": "redeploy_service",`
			`"risk_level": "guarded",`
			`"confidence": 0.9,`
			`"approval_required": true,`
			`"autonomous_eligible": false,`
			`"status": "pending",`
			`"payload": { ... },`
			`"rollback_reference": null`
			`}`
			```

			`## Safety Model`

			`Actions are categorized into safety classes:`

			- Safe: Low-risk actions that may be eligible for autonomous execution in the future (e.g., `collect_diagnostics`, `rerun_healthcheck`).
			- Guarded: Actions that default to requiring approval but could be automated under strict conditions (e.g., `redeploy_service`, `rerun_deployment_stage`).
			`- Dangerous: High-risk actions that ALWAYS require manual approval.`

			Currently, the platform operates in a Recommendation-Only mode where even `safe` actions require explicit approval.

			`## Initial Action Types`

			- `redeploy_service`: Restarts or redeploys a service container.
			- `rerun_healthcheck`: Triggers an immediate health check.
			- `rerun_deployment_stage`: Retries a specific stage of a failed deployment.
			- `collect_diagnostics`: Gathers logs and metrics for troubleshooting.

			`## Executor`

			The Executor (`scripts/executor/executor.py`) is responsible for processing approved actions. It features:

			- Process Approved Only: Only actions in the `approved/` directory are processed.
			`- Recommendation-Safe: Simulation-based execution that logs intended mutations without side effects.`
			`- Idempotency: Designed to be safe to run multiple times.`
			- Resumable State: If interrupted, it will pick up actions in the `running/` state.
			- Append-Only History: Maintains a `history.log` of all action transitions.

			`## Rollback Concepts`

			Every action schema includes a `rollback_reference`. In future iterations, this will point to the previous stable state or a reverse action that can be triggered if the current action fails or causes further instability.

			`## Future Autonomous Execution`

			`The system is designed to transition to autonomous execution by:`
			1. Identifying `safe` actions with high `confidence` scores.
			2. Matching them against a `policy-engine`.
			3. Automatically moving them from `pending/` to `approved/` based on allowed safety guardrails.