Skip to main content
AKOS

Run lifecycle

Follow one run from trigger to audit ledger — five steps, fully governed at every stage.

A run is one execution of a flow or process. This is the single most important path in AKOS — every governance feature plugs into it somewhere. Understanding it makes every other screen, metric, and error message legible.

The five steps

01 · START     Trigger fires
               schedule / webhook / button press

02 · CHECK     Permission + schema
               RBAC capability check · input validated against Zod schema

03 · EXECUTE   Sidecar runs nodes
               agents reason · tools fire · data passes between nodes

04 · PAUSE     Human gate (if any)
               run holds · reviewer is notified · resumes on their decision

05 · FINISH    Artifact + audit
               result saved as artifact · audit ledger entry signed

Every step of the way: sandboxed · token-cost tracked · OpenTelemetry traced · written to the signed audit ledger.

Step-by-step detail

01 · START

A trigger fires — a cron schedule hits, a webhook arrives, a SaaS event is received, or an operator presses Run. The trigger payload is parsed against the trigger's Zod schema. If the payload does not match, the run is rejected before reaching step 02.

02 · CHECK

Two gates before any agent executes:

  1. RBAC capability check — the dispatcher verifies the caller's principal has the capability required for the flow's entry handler. No capability, no call.
  2. Input schema validation — the flow's input schema is validated against the (now-typed) trigger payload. Mismatches are rejected with a typed AgentsKitError.

There is no path that reaches step 03 without passing both gates.

03 · EXECUTE

The sidecar process runs the flow's nodes in DAG order. Key properties during execution:

  • Sandboxed — agent code and shell commands run in an isolated sandbox. No ambient access to the host, the filesystem, or the network beyond what is explicitly granted.
  • Egress-controlled — every outbound network call is checked against the workspace egress allowlist before it leaves the process. Silent data exfiltration is structurally impossible.
  • Cost-tracked — token spend is measured per node and per run. The running total is visible in the UI in real time.
  • OpenTelemetry-traced — every node execution emits a span. Traces are queryable from the Observability screen and sink to Grafana or any compatible backend.
  • Checkpointed — every node writes a checkpoint before running. If the process crashes or is redeployed, the run resumes from the last checkpoint.

04 · PAUSE (if applicable)

If the flow contains a human-gate node, the run pauses at that point. Configured reviewers receive a notification. The run holds its checkpoint indefinitely. On approval, execution resumes from the gate. On rejection, the flow routes to the configured fallback branch (or terminates, if none is configured).

05 · FINISH

When the final output node executes:

  1. The result is saved as an artifact — a typed, inspectable snapshot of the run's output, retained and queryable from the Runs screen.
  2. A ledger entry is appended to the signed audit ledger — Ed25519-signed, append-only, tamper-evident. The entry records the principal, the action, the outcome, and the timestamp. It cannot be altered after the fact.

Governance guarantees

GuaranteeMechanism
Every run is authorizedRBAC capability check at step 02
Every run is sandboxedIsolated execution at step 03
Every run has a cost recordPer-node token tracking at step 03
Every run is observableOpenTelemetry spans at step 03
Every run is replayableCheckpoints at step 03; time-travel from any checkpoint
Every run is auditedSigned ledger entry at step 05

A run is never a black box. It is authorized, observed, costed, and recorded at every stage.

What happens when a run fails

If a node throws an unhandled error:

  • The failure is recorded in the run's trace with the full error context.
  • A Sentry adapter captures the error (if configured).
  • The run moves to the failed state.
  • The checkpoint from the last successful node is preserved. An operator can replay the run from that checkpoint after fixing the underlying issue.
  • The audit ledger records the failure — including which node failed and on whose behalf.

On this page

Run lifecycle · AKOS