76 lines
3.2 KiB
Markdown
76 lines
3.2 KiB
Markdown
|
|
# Action Queue System
|
||
|
|
|
||
|
|
The Action Queue System provides a safe, filesystem-first lifecycle for operational actions in the homelab platform. It enables controlled execution with mandatory approval for high-risk operations.
|
||
|
|
|
||
|
|
## Action Lifecycle
|
||
|
|
|
||
|
|
Actions move through various states, represented by directories under `/opt/homelab/actions/`:
|
||
|
|
|
||
|
|
1. **Pending** (`pending/`): Actions proposed by the Supervisor or other agents.
|
||
|
|
2. **Approved** (`approved/`): Actions that have been reviewed and approved for execution.
|
||
|
|
3. **Running** (`running/`): Actions currently being processed by the Executor.
|
||
|
|
4. **Completed** (`completed/`): Successfully executed actions.
|
||
|
|
5. **Failed** (`failed/`): Actions that encountered errors during execution.
|
||
|
|
6. **Rejected** (`rejected/`): Proposed actions that were explicitly denied.
|
||
|
|
|
||
|
|
## Action Schema
|
||
|
|
|
||
|
|
Actions are stored as JSON documents with the following structure:
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"action_id": "uuid",
|
||
|
|
"created_at": 1620000000.0,
|
||
|
|
"proposed_by": "supervisor",
|
||
|
|
"correlation_id": "uuid",
|
||
|
|
"node": "node-name",
|
||
|
|
"service": "service-name",
|
||
|
|
"action_type": "redeploy_service",
|
||
|
|
"risk_level": "guarded",
|
||
|
|
"confidence": 0.9,
|
||
|
|
"approval_required": true,
|
||
|
|
"autonomous_eligible": false,
|
||
|
|
"status": "pending",
|
||
|
|
"payload": { ... },
|
||
|
|
"rollback_reference": null
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Safety Model
|
||
|
|
|
||
|
|
Actions are categorized into safety classes:
|
||
|
|
|
||
|
|
- **Safe**: Low-risk actions that may be eligible for autonomous execution in the future (e.g., `collect_diagnostics`, `rerun_healthcheck`).
|
||
|
|
- **Guarded**: Actions that default to requiring approval but could be automated under strict conditions (e.g., `redeploy_service`, `rerun_deployment_stage`).
|
||
|
|
- **Dangerous**: High-risk actions that ALWAYS require manual approval.
|
||
|
|
|
||
|
|
Currently, the platform operates in a **Recommendation-Only** mode where even `safe` actions require explicit approval.
|
||
|
|
|
||
|
|
## Initial Action Types
|
||
|
|
|
||
|
|
- `redeploy_service`: Restarts or redeploys a service container.
|
||
|
|
- `rerun_healthcheck`: Triggers an immediate health check.
|
||
|
|
- `rerun_deployment_stage`: Retries a specific stage of a failed deployment.
|
||
|
|
- `collect_diagnostics`: Gathers logs and metrics for troubleshooting.
|
||
|
|
|
||
|
|
## Executor
|
||
|
|
|
||
|
|
The Executor (`scripts/executor/executor.py`) is responsible for processing approved actions. It features:
|
||
|
|
|
||
|
|
- **Process Approved Only**: Only actions in the `approved/` directory are processed.
|
||
|
|
- **Recommendation-Safe**: Simulation-based execution that logs intended mutations without side effects.
|
||
|
|
- **Idempotency**: Designed to be safe to run multiple times.
|
||
|
|
- **Resumable State**: If interrupted, it will pick up actions in the `running/` state.
|
||
|
|
- **Append-Only History**: Maintains a `history.log` of all action transitions.
|
||
|
|
|
||
|
|
## Rollback Concepts
|
||
|
|
|
||
|
|
Every action schema includes a `rollback_reference`. In future iterations, this will point to the previous stable state or a reverse action that can be triggered if the current action fails or causes further instability.
|
||
|
|
|
||
|
|
## Future Autonomous Execution
|
||
|
|
|
||
|
|
The system is designed to transition to autonomous execution by:
|
||
|
|
1. Identifying `safe` actions with high `confidence` scores.
|
||
|
|
2. Matching them against a `policy-engine`.
|
||
|
|
3. Automatically moving them from `pending/` to `approved/` based on allowed safety guardrails.
|