# Workflows

Multi-agent handoffs declared as a directed acyclic graph. One YAML manifest defines the nodes, the edges, and the conditions that route a run from start to finish. The scheduler drives every step. Every transition lands in the work-chain audit log.

This page covers the seven node kinds, the run lifecycle, and how to integrate with approval gates.

---

## When to reach for a workflow

Single-agent invocations don't need this. If the answer is "ask Claude, get a response, done," you're already there.

Workflows are for the cases where:

- One agent's output feeds another agent's input.
- A run needs to pause for a human to approve before the next step runs.
- Parallel branches need to converge before the workflow can finish.
- The same set of steps runs on a schedule, on a webhook, or on an event from elsewhere in your stack.

If you've used Airflow, Temporal, or Step Functions, the mental model is the same. The differences are: the actors are agents, the audit trail is hash-chained, and the YAML lives in your manifest tree alongside everything else.

---

## The seven node kinds

| Kind | Purpose | Terminal? |
|---|---|---|
| `trigger` | Entry point. Starts a run. | No |
| `agent` | Hands the run off to a capability-matched agent session. | No |
| `condition` | Evaluates a Python-like expression against accumulated state; routes to one of N branches. | No |
| `approval` | Pauses the run until an authorized human approves or rejects. | No |
| `subflow` | Starts a child workflow run; advances when the child completes. | No |
| `output` | Marks the run done. Outputs land in `run.outputs_json`. | Yes |
| `transform` | Pure function over inputs → outputs. No side effects. Useful for shape-shifting data between agent steps. | No |

Every node has a `node_id`, a `kind`, and a kind-specific `inputs` map. Edges are declared by listing `next` (or, for `condition`, branch-keyed `when` map).

---

## A worked example

A pull-request triage workflow. New PR fires the trigger, the reviewer agent runs, an OUAdmin approves or rejects merge, on approval the merger agent merges + closes the issue.

```yaml
apiVersion: powerloom/v1
kind: WorkflowDefinition
metadata:
  name: pr-triage
  scope_ou_path: /acme/engineering
  description: Review + approve + merge incoming PRs
spec:
  nodes:
    - node_id: start
      kind: trigger
      next: review

    - node_id: review
      kind: agent
      required_capabilities: [pr_review]
      inputs:
        pr_url: "{{ trigger.inputs.pr_url }}"
      next: gate

    - node_id: gate
      kind: approval
      approver_role: OUAdmin
      timeout_seconds: 86400
      next: merge

    - node_id: merge
      kind: agent
      required_capabilities: [pr_merge]
      inputs:
        pr_url: "{{ trigger.inputs.pr_url }}"
        review_summary: "{{ review.outputs.summary }}"
      next: done

    - node_id: done
      kind: output
      outputs:
        merged: "{{ merge.outputs.merged_at }}"
```

Apply with `weave workflow apply -f pr-triage.yaml`. Trigger a run with `weave workflow run pr-triage --inputs pr_url=https://github.com/acme/...`. Watch live progress at `/workflows/runs/<run-id>`.

The `{{ ... }}` template syntax pulls from accumulated state — every prior node's outputs are available by `<node_id>.outputs.<key>`.

---

## Run lifecycle

A run advances through five statuses:

- `queued` — fresh run, scheduler hasn't picked it up yet (usually <2s)
- `running` — at least one node is executing
- `waiting` — a node is in `waiting` (subflow in progress, approval pending, agent assigned but not done)
- `done` — terminal output node reached, outputs persisted
- `failed` — a node returned an error and the workflow has no recovery branch
- `cancelled` — explicit cancel from `weave workflow cancel <run>` or the UI

The scheduler ticks every two seconds. On each tick it advances every step that's ready: `pending` → `running` for newly-eligible nodes, `running` → `done` for completed agent steps, `waiting` → `running` for approvals that landed.

If a node fails and the workflow has no error-handling branch, the run goes to `failed` immediately. Add a `condition` node downstream of risky steps to route to a recovery path explicitly.

---

## Approval nodes

The most common reason to reach for a workflow over a single agent invocation: you want a human in the loop on a specific decision.

```yaml
- node_id: approve_send
  kind: approval
  approver_role: OUAdmin
  resource_kind: workflow_step
  description: "Approve sending the customer comms?"
  timeout_seconds: 3600
  on_timeout: auto_deny
  next: send
```

The step transitions to `waiting` and an approval request lands in the inbox of every user holding `approver_role` at the workflow's scope OU or any ancestor. The approver sees the pending step at `/approvals?tab=workflow_steps`, with the run context one click away. Approve → run advances. Reject → run goes to `failed`. Timeout → falls through `on_timeout` (default `auto_deny`).

Per Phase 34, the approval also fires `approval_pending_for_me` notifications — in-app card, email, and (if the user has subscribed) web push.

---

## Conditions and branches

```yaml
- node_id: route
  kind: condition
  when:
    high_value: "{{ extract.outputs.amount > 10000 }}"
    low_value:  "{{ extract.outputs.amount <= 10000 }}"
  next:
    high_value: human_review
    low_value: auto_approve
```

The expression body is a sandboxed Python expression — only the safe builtins (`abs`, `len`, `int`, comparison operators, etc.) are in scope, plus the `inputs` and `<node_id>.outputs` accumulators. No file I/O, no imports, no eval. Match keys in `when` declare the branches; matching keys in `next` declare which node each branch routes to.

The first branch that evaluates truthy wins. Sibling branches not taken get marked `skipped`.

For more complex routing, chain multiple condition nodes — readable conditions beat clever single expressions.

---

## Subflows

A `subflow` node starts a child workflow run and waits for it to complete. Useful for breaking a large workflow into reusable pieces, or for fanning out parallel work that needs to converge.

```yaml
- node_id: per_region_rollout
  kind: subflow
  workflow_name: rollout-one-region
  inputs:
    region: "{{ trigger.inputs.region }}"
  next: confirm
```

The child run inherits the parent's RBAC scope and audit context. Failures in the child propagate to the parent unless the parent explicitly catches them.

---

## What this isn't

Workflows aren't a general orchestration engine. They're a way to declare agent-handoff DAGs that respect the same RBAC, audit, and approval primitives as everything else in Powerloom. Heavy long-running workflow scenarios (millions of concurrent runs, hour-scale node executions) are better served by Temporal or Step Functions; Powerloom workflows assume runs measured in minutes and concurrency in the dozens, not the millions.

For the cases where you do want hand-offs, deny-bound permissions on each step, and an auditor-ready trail of who decided what — that's the fit.
