agent-system/docs/action-queue-system.md

3.2 KiB

Action Queue System

The Action Queue System provides a safe, filesystem-first lifecycle for operational actions in the homelab platform. It enables controlled execution with mandatory approval for high-risk operations.

Action Lifecycle

Actions move through various states, represented by directories under /opt/homelab/actions/:

  1. Pending (pending/): Actions proposed by the Supervisor or other agents.
  2. Approved (approved/): Actions that have been reviewed and approved for execution.
  3. Running (running/): Actions currently being processed by the Executor.
  4. Completed (completed/): Successfully executed actions.
  5. Failed (failed/): Actions that encountered errors during execution.
  6. Rejected (rejected/): Proposed actions that were explicitly denied.

Action Schema

Actions are stored as JSON documents with the following structure:

{
  "action_id": "uuid",
  "created_at": 1620000000.0,
  "proposed_by": "supervisor",
  "correlation_id": "uuid",
  "node": "node-name",
  "service": "service-name",
  "action_type": "redeploy_service",
  "risk_level": "guarded",
  "confidence": 0.9,
  "approval_required": true,
  "autonomous_eligible": false,
  "status": "pending",
  "payload": { ... },
  "rollback_reference": null
}

Safety Model

Actions are categorized into safety classes:

  • Safe: Low-risk actions that may be eligible for autonomous execution in the future (e.g., collect_diagnostics, rerun_healthcheck).
  • Guarded: Actions that default to requiring approval but could be automated under strict conditions (e.g., redeploy_service, rerun_deployment_stage).
  • Dangerous: High-risk actions that ALWAYS require manual approval.

Currently, the platform operates in a Recommendation-Only mode where even safe actions require explicit approval.

Initial Action Types

  • redeploy_service: Restarts or redeploys a service container.
  • rerun_healthcheck: Triggers an immediate health check.
  • rerun_deployment_stage: Retries a specific stage of a failed deployment.
  • collect_diagnostics: Gathers logs and metrics for troubleshooting.

Executor

The Executor (scripts/executor/executor.py) is responsible for processing approved actions. It features:

  • Process Approved Only: Only actions in the approved/ directory are processed.
  • Recommendation-Safe: Simulation-based execution that logs intended mutations without side effects.
  • Idempotency: Designed to be safe to run multiple times.
  • Resumable State: If interrupted, it will pick up actions in the running/ state.
  • Append-Only History: Maintains a history.log of all action transitions.

Rollback Concepts

Every action schema includes a rollback_reference. In future iterations, this will point to the previous stable state or a reverse action that can be triggered if the current action fails or causes further instability.

Future Autonomous Execution

The system is designed to transition to autonomous execution by:

  1. Identifying safe actions with high confidence scores.
  2. Matching them against a policy-engine.
  3. Automatically moving them from pending/ to approved/ based on allowed safety guardrails.