agent-system/docs/action-queue-system.md

76 lines
3.2 KiB
Markdown
Raw Permalink Normal View History

2026-05-12 18:01:37 +02:00
# Action Queue System
The Action Queue System provides a safe, filesystem-first lifecycle for operational actions in the homelab platform. It enables controlled execution with mandatory approval for high-risk operations.
## Action Lifecycle
Actions move through various states, represented by directories under `/opt/homelab/actions/`:
1. **Pending** (`pending/`): Actions proposed by the Supervisor or other agents.
2. **Approved** (`approved/`): Actions that have been reviewed and approved for execution.
3. **Running** (`running/`): Actions currently being processed by the Executor.
4. **Completed** (`completed/`): Successfully executed actions.
5. **Failed** (`failed/`): Actions that encountered errors during execution.
6. **Rejected** (`rejected/`): Proposed actions that were explicitly denied.
## Action Schema
Actions are stored as JSON documents with the following structure:
```json
{
"action_id": "uuid",
"created_at": 1620000000.0,
"proposed_by": "supervisor",
"correlation_id": "uuid",
"node": "node-name",
"service": "service-name",
"action_type": "redeploy_service",
"risk_level": "guarded",
"confidence": 0.9,
"approval_required": true,
"autonomous_eligible": false,
"status": "pending",
"payload": { ... },
"rollback_reference": null
}
```
## Safety Model
Actions are categorized into safety classes:
- **Safe**: Low-risk actions that may be eligible for autonomous execution in the future (e.g., `collect_diagnostics`, `rerun_healthcheck`).
- **Guarded**: Actions that default to requiring approval but could be automated under strict conditions (e.g., `redeploy_service`, `rerun_deployment_stage`).
- **Dangerous**: High-risk actions that ALWAYS require manual approval.
Currently, the platform operates in a **Recommendation-Only** mode where even `safe` actions require explicit approval.
## Initial Action Types
- `redeploy_service`: Restarts or redeploys a service container.
- `rerun_healthcheck`: Triggers an immediate health check.
- `rerun_deployment_stage`: Retries a specific stage of a failed deployment.
- `collect_diagnostics`: Gathers logs and metrics for troubleshooting.
## Executor
The Executor (`scripts/executor/executor.py`) is responsible for processing approved actions. It features:
- **Process Approved Only**: Only actions in the `approved/` directory are processed.
- **Recommendation-Safe**: Simulation-based execution that logs intended mutations without side effects.
- **Idempotency**: Designed to be safe to run multiple times.
- **Resumable State**: If interrupted, it will pick up actions in the `running/` state.
- **Append-Only History**: Maintains a `history.log` of all action transitions.
## Rollback Concepts
Every action schema includes a `rollback_reference`. In future iterations, this will point to the previous stable state or a reverse action that can be triggered if the current action fails or causes further instability.
## Future Autonomous Execution
The system is designed to transition to autonomous execution by:
1. Identifying `safe` actions with high `confidence` scores.
2. Matching them against a `policy-engine`.
3. Automatically moving them from `pending/` to `approved/` based on allowed safety guardrails.