# Agent Deployment — From Definition to Running Session

Powerloom handles the full lifecycle of a Claude agent: define it, reconcile it onto Anthropic's infrastructure, wire its tools and credentials, invoke it, and stream its output — with every step governed by RBAC and recorded in the audit trail.

You declare what you want. The reconciler makes it real.

---

## The Lifecycle

```
Define → Reconcile → Invoke → Stream → Audit
```

**Define.** Create an agent via the API, CLI manifest, or meta-agent. The agent is a row in your database — a desired state declaration. At this point, nothing exists on Anthropic's infrastructure.

**Reconcile.** The reconciler picks up the new agent within seconds, resolves its dependencies (environment, skills, MCP servers), and pushes it to Anthropic's Claude Managed Agents runtime. The agent now exists on CMA and can be invoked.

**Invoke.** A user (or another agent) starts a session. Powerloom checks RBAC, creates a session on CMA, sends the kickoff message, and begins streaming events.

**Stream.** Events flow from CMA through Powerloom to the client via WebSocket. Every tool call, every message, every status change — in real time.

**Audit.** The session is recorded. Event counts, MCP tool usage, duration, degradation flags — all queryable. Control-plane mutations (agent creation, skill attachment, access changes) land in the hash-chained audit trail.

---

## Defining an Agent

An agent definition has six core properties:

| Property | Purpose |
|---|---|
| `name` | Unique identifier within its OU |
| `model` | Claude model (`claude-sonnet-4-6`, `claude-opus-4-1`, etc.) |
| `system_prompt` | Instructions that define the agent's behavior |
| `agent_kind` | `user` (inherits owner's permissions) or `service` (OU-scoped) |
| `owner_principal` | The user or group responsible for this agent |
| `ou` | Where the agent lives in the organizational hierarchy |

Agents are scoped to an OU. An agent in `/acme/engineering/platform` inherits the platform OU's policies. RBAC determines who can create, modify, invoke, and view the agent.

### Via CLI manifest:

```yaml
apiVersion: powerloom/v1
kind: Agent
metadata:
  name: code-reviewer
  ou_path: /acme/engineering
spec:
  display_name: Code Reviewer
  model: claude-sonnet-4-6
  system_prompt: |
    Review Python code for correctness, security, and style.
    Flag potential bugs. Suggest improvements. Be concise.
  agent_kind: user
  owner_principal_ref: user:admin@acme.com
  skills: [python-lint]
  mcp_servers: [repo-files]
```

```bash
weave apply code-reviewer.yaml
```

### Via API:

```bash
curl -X POST https://api.powerloom.org/agents \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "name": "code-reviewer",
    "ou_id": "a1b2c3d4-...",
    "display_name": "Code Reviewer",
    "model": "claude-sonnet-4-6",
    "system_prompt": "Review Python code for correctness...",
    "agent_kind": "user",
    "owner_principal_id": "e5f6g7h8-..."
  }'
```

### Via meta-agent:

> "Create a code review agent for the engineering team that checks Python for bugs and security issues."

The meta-agent interprets this, resolves the OU and owner, creates the agent definition, and reports back — fully governed, fully audited.

---

## The Reconciler

The reconciler is Powerloom's convergence engine. It ensures that what you declared (desired state) matches what exists on Anthropic's infrastructure (actual state).

### How it works

Every resource that touches CMA — agents, skills, environments, vaults, credentials — has a **sync state** record. This record tracks two hashes:

- **desired_hash** — SHA-256 of what the resource should look like, computed from its database row
- **actual_hash** — SHA-256 of what Anthropic currently has

When you create or modify an agent, Powerloom computes a new desired hash and marks the resource as dirty. The reconciler runs every two seconds, picks up dirty resources, and pushes them to CMA.

```
Agent created/updated in Powerloom DB
        ↓
desired_hash computed, sync_state marked dirty
        ↓
Reconciler wakes (~2 seconds)
        ↓
Fetches all rows where desired_hash ≠ actual_hash
        ↓
Resolves dependencies (environment, skills, MCP servers)
        ↓
Pushes to Anthropic CMA API
        ↓
Stores CMA agent ID, sets actual_hash = desired_hash
        ↓
Agent is live and invocable
```

### Dependency resolution

An agent can't be pushed to CMA until its dependencies exist:

1. **Environment.** Every organization has a shared cloud environment on CMA. If it doesn't exist yet, the reconciler creates it first and comes back to the agent on the next tick.

2. **Skills.** Every attached skill must already be uploaded to Anthropic with a valid skill ID and version string. Unsynced skills are reconciled first.

3. **MCP servers.** Every attached MCP server registration must exist with a stable server name.

If a dependency isn't ready, the reconciler skips the agent without counting it as a failure. No retry penalty. Next tick, it tries again.

### Failure handling

If a CMA API call fails, the reconciler increments an attempt counter and backs off exponentially — 2 seconds, 4, 8, up to a 5-minute cap. After 10 consecutive failures, the resource is parked. It won't be retried until someone fixes the underlying issue and resets it.

Transient failures (network blips, CMA rate limits) resolve themselves. Permanent failures (invalid model ID, malformed prompt) park quickly and surface in the sync state for diagnosis.

### Drift detection

Every 60 seconds, the reconciler sweeps all synced resources to check for drift — cases where the actual state has diverged from the desired state. If drift is detected, the resource is re-marked as dirty and reconverged on the next tick.

---

## Wiring Tools: Skills and MCP Servers

Agents gain capabilities through two mechanisms: **skills** (code packages uploaded to Anthropic) and **MCP servers** (external tool servers your agent can call).

### Skills

A skill is a versioned code package. Upload it, attach it to an agent, and the agent can use it.

```yaml
kind: Skill
metadata:
  name: python-lint
  ou_path: /acme/engineering
spec:
  display_name: Python Linter
  description: Lint Python files for style and error patterns.
```

Skill versions are immutable. Each upload creates a new version with a content hash. The reconciler pushes versions to Anthropic and stores the CMA version string for reference in agent definitions.

Attach a skill to an agent by listing it in the agent's manifest:

```yaml
kind: Agent
metadata:
  name: code-reviewer
  ou_path: /acme/engineering
spec:
  skills: [python-lint]
```

Or attach it via the API:

```bash
curl -X POST https://api.powerloom.org/agents/$AGENT_ID/skills \
  -d '{"skill_id": "...", "skill_version_id": "..."}'
```

Attaching or detaching a skill marks the agent as dirty. The reconciler pushes the updated agent definition to CMA on the next tick.

### MCP Servers

MCP (Model Context Protocol) servers give agents access to external tools — databases, file systems, email, Slack, custom APIs.

Powerloom supports two deployment models:

**Managed deployments.** Powerloom deploys MCP servers from a template catalog. Templates available: `files`, `postgres`, `slack`, `echo`.

```yaml
kind: MCPDeployment
metadata:
  name: pg-analytics
  ou_path: /acme/engineering/platform
spec:
  template_kind: postgres
  config:
    database_url_secret: pg-analytics-dsn
  policy:
    allowed_tables: [metrics, events]
    deny_mutation_keywords: true
```

Managed deployments include a policy engine. The `postgres` template enforces table allowlists and can block mutation keywords. The `files` template enforces path allowlists. The `slack` template enforces channel allowlists.

**BYO registrations.** Register a pre-existing MCP server by URL.

```yaml
kind: MCPServerRegistration
metadata:
  name: custom-tools
  ou_path: /acme/engineering
spec:
  display_name: Custom Tools
  url: https://tools.acme.internal/mcp/mcp
```

Powerloom validates the MCP handshake on registration and discovers the server's available tools.

### Credentials

Agents authenticate to MCP servers with bearer tokens. Powerloom manages these through the credential store.

```yaml
kind: Credential
metadata:
  agent_path: /acme/engineering/platform/pg-writer
  mcp_registration_path: /acme/engineering/platform/pg-analytics
```

The bearer token is minted server-side and stored using envelope encryption (AES-256-GCM with a master key in AWS Secrets Manager). The token never appears in a manifest — only a reference to the agent-MCP binding.

The reconciler pushes credentials to a CMA vault, where they're available to the agent at session runtime.

---

## Invoking an Agent

Once an agent is reconciled, it can be invoked. Invocation creates a session — a single run of the agent with a prompt.

### Starting a session

```bash
curl -X POST https://api.powerloom.org/agents/$AGENT_ID/invoke \
  -H "Authorization: Bearer $TOKEN" \
  -d '{
    "prompt": "Review the migration in PR #42 for safety issues.",
    "mode": "fire_and_forget",
    "title": "PR #42 review"
  }'
```

Powerloom checks:
1. Does the user have `agent:invoke` permission on this agent's OU?
2. Is the agent synced with CMA? (If not, 409 — reconciler hasn't finished yet.)
3. Is the organization's environment synced?

If all checks pass, Powerloom creates a session on CMA, sends the kickoff message, and begins streaming events.

### Two invocation modes

**Sync.** The API call blocks until the session completes. The response includes the final status.

**Fire-and-forget.** The API returns immediately with a WebSocket ticket. Connect to the WebSocket to stream events in real time.

```json
{
  "session_id": "s1a2b3c4-...",
  "status": "pending",
  "mode": "fire_and_forget",
  "ws_ticket": "tk_a1b2c3...",
  "ws_url": "/sessions/s1a2b3c4-.../stream?ticket=tk_a1b2c3..."
}
```

### WebSocket streaming

Connect to the WebSocket URL with the one-time ticket. Events arrive as JSON frames:

```json
{"type": "powerloom.session_created", "payload": {"cma_session_id": "..."}}
{"type": "agent.message", "payload": {"content": [{"type": "text", "text": "..."}]}}
{"type": "agent.mcp_tool_use", "payload": {"tool": "sql.explain", "input": {...}}}
{"type": "agent.mcp_tool_result", "payload": {"result": {...}}}
{"type": "session.status_idle", "payload": {"stop_reason": {"type": "end_turn"}}}
{"type": "powerloom.session_ended", "payload": {"event_count": 14, "mcp_tool_use_count": 3}}
```

Tickets are single-use and expire after 60 seconds. One WebSocket client per session.

---

## Session Lifecycle

A session moves through a defined set of states:

```
pending → running → terminated
                  → idle_end_turn
                  → timeout
                  → failed
```

**pending** — Session row created. CMA session not yet started.

**running** — CMA session active. Events are streaming.

**terminated** — Agent reached a terminal state. Session complete.

**idle_end_turn** — Agent finished its turn and is idle. Session may continue with additional prompts (future capability) or close.

**timeout** — Session exceeded the maximum duration (default: 10 minutes).

**failed** — Session encountered an unrecoverable error.

### Silent MCP degradation detection

CMA has a specific failure mode: if a vault credential is missing, expired, or unreachable, the agent silently falls back to basic tools (bash, web search) instead of MCP tools. The session looks successful — it completes normally — but the agent never used the tools it was supposed to use.

Powerloom detects this automatically. If a session completes with zero MCP tool calls on an agent that has MCP servers attached, and the session had at least 10 events (enough to rule out trivial prompts), the session is flagged as `degradation_flagged: true`.

This flag surfaces in the session detail and can be used to diagnose credential issues before they affect production workflows.

---

## Approval Gates

Agent creation and modification can be gated behind approval policies.

If an approval policy matches — for example, "agent creation in the production OU requires OrgAdmin approval" — the mutation enters a pending state instead of applying immediately. The response is 202 (Accepted), not 201 (Created).

An authorized approver sees the request in their `/approvals` inbox, reviews the proposed change, and approves or rejects. On approval, the original mutation replays atomically. On rejection (or after the 7-day TTL expires), the request is abandoned.

Every step — the request, the decision, the replayed mutation — writes a row to the control-plane audit trail.

See the approval policies documentation for full configuration details.

---

## API Reference

### Agents

| Method | Endpoint | Description |
|---|---|---|
| POST | `/agents` | Create an agent |
| GET | `/agents` | List agents (filter by `ou_id`) |
| GET | `/agents/{agent_id}` | Get agent detail |
| PATCH | `/agents/{agent_id}` | Update an agent |
| DELETE | `/agents/{agent_id}` | Archive an agent |

### Agent Attachments

| Method | Endpoint | Description |
|---|---|---|
| POST | `/agents/{agent_id}/skills` | Attach a skill |
| DELETE | `/agents/{agent_id}/skills/{skill_id}` | Detach a skill |
| POST | `/agents/{agent_id}/mcp-servers` | Attach an MCP server |
| DELETE | `/agents/{agent_id}/mcp-servers/{reg_id}` | Detach an MCP server |

### Sessions

| Method | Endpoint | Description |
|---|---|---|
| POST | `/agents/{agent_id}/invoke` | Create and start a session |
| GET | `/sessions` | List sessions (filter by `agent_id`) |
| GET | `/sessions/{session_id}` | Get session detail |
| WS | `/sessions/{session_id}/stream?ticket=` | Stream session events |

### MCP Infrastructure

| Method | Endpoint | Description |
|---|---|---|
| POST | `/mcp-servers` | Register a BYO MCP server |
| GET | `/mcp-servers` | List MCP registrations |
| POST | `/mcp-deployments` | Deploy a managed MCP server |
| GET | `/mcp-deployments` | List deployments |
| POST | `/mcp-deployments/{id}/destroy` | Tear down a deployment |

### Credentials

| Method | Endpoint | Description |
|---|---|---|
| POST | `/credentials` | Create a credential (bearer minted server-side) |
| GET | `/credentials` | List credentials (filter by `agent_id`) |
| DELETE | `/credentials/{id}` | Delete a credential |
