# Authentication & Audit — How Powerloom Secures and Records Everything

Two systems work in tandem across the entire platform. Authentication ensures every action traces to a verified identity. The audit trail ensures every action is recorded, tamper-proof, and reproducible. Together, they answer the three questions that matter after any incident: **What happened? Who authorized it? Can we prove it?**

---

## Authentication

### How Users Authenticate

Powerloom supports four authentication paths. All produce the same result: a signed JWT that carries the user's identity, organization, and session metadata.

**Email + password (native auth):**
- Passwords hashed with Argon2id (memory-hard, GPU-resistant)
- Email verification required via AWS SES
- Rate-limited login attempts

**OAuth (Google, GitHub, Microsoft):**
- Standard OAuth 2.0 authorization code flow
- Account linking on first login (matches by verified email)
- Tokens managed by Powerloom — provider tokens are never stored long-term

**OIDC (AWS IAM Identity Center):**
- Device-code flow for CLI authentication
- Standard OIDC token exchange
- User provisioned on first login if email matches an invite

**Dev mode (local development only):**
- Unauthenticated impersonation via `--dev-as email`
- Disabled in production (`POWERLOOM_AUTH_MODE=dev` required)

### JWT Structure

Every authenticated request carries a Bearer token. The JWT contains:

| Claim | Purpose |
|---|---|
| `sub` | User ID (UUID) — the authenticated principal |
| `org` | Organization ID — scopes all data access |
| `email` | User email — for display and audit correlation |
| `type` | Token type (`access`) |
| `iat` | Issued-at timestamp |
| `exp` | Expiration timestamp |

Access tokens expire after a configured TTL. Refresh tokens are issued alongside access tokens for session continuity.

### MFA

TOTP (Time-based One-Time Password) with Argon2id-hashed backup codes. Per-org policies can mandate MFA for specific roles.

- 6-digit codes, 30-second rotation
- 10 single-use backup codes generated at enrollment
- Backup codes hashed individually (one compromised code doesn't expose others)
- No SMS MFA — explicitly rejected due to SIM-swap vulnerability

---

## How Agents Authenticate

Agents don't have passwords. They authenticate through a chain of trust that traces back to a human identity.

### The chain

```
Human admin authenticates (JWT)
    ↓
Admin creates agent (RBAC-checked: agent:create)
    ↓
Admin attaches MCP server to agent
    ↓
Admin creates credential (bearer token minted server-side)
    ↓
Credential stored in vault (envelope-encrypted)
    ↓
Reconciler pushes credential to CMA vault
    ↓
Agent session starts → CMA injects credential at runtime
    ↓
Agent calls MCP server with bearer token
    ↓
MCP server validates bearer → grants tool access
```

Every link in this chain is auditable. The credential was created by a specific admin, for a specific agent, targeting a specific MCP server, at a specific time. The session that uses it traces back to the user who invoked it.

### Credential storage

Bearer tokens are encrypted at rest using envelope encryption:

- **Algorithm:** AES-256-GCM (authenticated encryption with nonce)
- **Master key:** Stored in AWS Secrets Manager (never touches the application database)
- **Per-credential nonce:** Unique per token, stored alongside the ciphertext
- **Decryption:** Only at session creation time, only by the reconciler pushing to CMA vaults

The plaintext bearer token exists in exactly two places: the CMA vault (Anthropic's infrastructure, used at agent runtime) and transiently in memory during reconciliation. It never appears in API responses, manifests, logs, or the database.

### Vault model

Each principal (user or service account) has one vault. A vault holds credentials scoped to specific agent + MCP server pairs.

```
Vault (owned by principal)
  └── Credential (agent: pg-writer, mcp: pg-analytics)
  └── Credential (agent: pg-writer, mcp: report-files)
  └── Credential (agent: code-reviewer, mcp: repo-files)
```

One credential per (vault, agent, MCP server) tuple. Rotation creates a new credential for the same tuple — the old one is archived.

### Session-delegated JWTs (meta-agent)

When the meta-agent acts on behalf of an admin, it doesn't use a service account. Instead, each tool call mints a short-lived JWT bound to the admin's identity:

- **TTL:** 5 minutes
- **Scope:** The admin's user ID, organization, and RBAC permissions
- **Re-issued:** Fresh token per tool call (no reuse)
- **Audit field:** `delegated_from_session` links back to the CMA session

This means the meta-agent can never exceed the admin's permissions. RBAC evaluates against the admin's identity, not the agent's.

---

## How Sessions Are Authenticated

When a user invokes an agent, Powerloom verifies:

1. **User authentication** — valid JWT with unexpired `exp`
2. **RBAC check** — user has `agent:invoke` permission on the agent's OU
3. **Agent sync check** — agent has been reconciled to CMA (has a `cma_agent_id`)
4. **Environment check** — organization's CMA environment is synced

Only then does Powerloom create a session on CMA and begin streaming.

### WebSocket authentication

WebSocket connections use a ticket system:

1. `POST /agents/{id}/invoke` returns a `ws_ticket` (random 64-character string)
2. Client connects to `/sessions/{id}/stream?ticket=<ticket>`
3. Server validates: ticket exists, matches session ID, matches user, not expired (60-second TTL), not already consumed
4. Ticket is consumed on first use — single-use, no replay

---

## The Audit Trail

### Two audit systems

Powerloom maintains two separate audit logs for different trust boundaries:

**Control-plane audit (`control_plane_audit`)** — records every mutation within a customer organization. Visible to the customer's admins. Per-org isolation.

**Super-admin audit (`super_admin_audit_log`)** — records every action taken by Powerloom staff on the operations console. Visible only to super-admins. No customer data in the entries.

### Control-plane audit: the hash chain

The `control_plane_audit` table is append-only and cryptographically chained.

**Every control-plane mutation writes a row.** Creating an agent, attaching a skill, changing a role binding, approving a request, rotating a credential — all of it. No mutation escapes the log.

**Structure of an audit row:**

| Field | Purpose |
|---|---|
| `id` | UUID (client-generated, deterministic before insert) |
| `organization_id` | Partitioning and isolation key |
| `actor_principal_id` | Who did this — user, service agent, or system |
| `actor_type` | `user` / `service_agent` / `super_admin` / `system` |
| `action_verb` | `create` / `update` / `delete` / `attach` / `detach` / `approve` / `reject` / `auto_deny` |
| `resource_kind` | `agent` / `skill` / `mcp_deployment` / `role_binding` / `credential` / `approval_request` / ... |
| `resource_id` | The specific resource affected |
| `before_json` | State before the change (null for creates) |
| `after_json` | State after the change (null for deletes) |
| `approval_request_id` | Links mutations to their approval flow (if gated) |
| `occurred_at` | Server timestamp — immutable after write |
| `prev_hash` | SHA-256 of the previous row for this organization |
| `this_hash` | SHA-256 of (canonical row bytes + prev_hash) |

### How the hash chain works

Each audit row contains the SHA-256 hash of the previous row for the same organization, plus its own hash computed from the row content and the previous hash.

```
Row 1: this_hash = SHA-256(row_1_bytes + 0x00)         ← genesis
Row 2: this_hash = SHA-256(row_2_bytes + row_1.this_hash)
Row 3: this_hash = SHA-256(row_3_bytes + row_2.this_hash)
...
```

This creates a chain: if any row is modified, its hash changes, which breaks the chain for every subsequent row. Tampering with history is detectable by walking the chain and recomputing hashes.

### Append-only enforcement

The table is protected by a **database trigger** that rejects UPDATE and DELETE operations at the PostgreSQL level. Application code cannot remove or modify audit rows — the constraint is below the application layer.

```sql
-- Simplified
CREATE TRIGGER audit_immutable
BEFORE UPDATE OR DELETE ON control_plane_audit
FOR EACH ROW EXECUTE FUNCTION reject_mutation();
```

This is not a convention. It's enforcement. Even with direct database access, the trigger fires.

### Integrity verification

A background worker runs nightly. It walks each organization's audit chain, recomputes every hash from the row content and the previous hash, and compares against the stored `this_hash`. Any mismatch surfaces a P1 alert.

**What a verification failure means:** Someone with raw database access modified or inserted a row outside the application. The chain identifies exactly which row was tampered with — everything before the break is verified, everything after is suspect.

On failure:
- The super-admin dashboard shows a tamper banner identifying the affected organization
- The customer's audit page surfaces a warning
- The operations team investigates

### Retention by tier

| Tier | Retention |
|---|---|
| Free / Trial | 7 days |
| Starter | 30 days |
| Team | 1 year |
| Business | 3 years |
| Enterprise | Configurable (up to 10 years for legal-hold) |

Rows beyond the retention window are purged by a daily job — but only after the nightly verifier has confirmed chain integrity. You can't delete unverified history.

Enterprise tier includes streaming export to S3 for SIEM integration (Splunk, Datadog).

---

## What Gets Audited

### Control-plane mutations (customer audit trail)

Every action that changes state in your organization:

**Agent lifecycle:**
- Agent created, updated, archived, deleted
- Skill attached, detached
- MCP server attached, detached
- Credential created, rotated, deleted

**Organizational structure:**
- OU created, updated, reparented, deleted
- Group created, updated, deleted
- Member added to group, removed from group

**Access control:**
- Role binding created, deleted
- Skill access grant created, deleted

**Approvals:**
- Approval request created (includes pending payload)
- Request approved (includes approver and comment)
- Request rejected
- Request cancelled by requester
- Request auto-denied on TTL expiry

**Infrastructure:**
- MCP deployment created, destroyed
- MCP server registered, updated, deregistered

Each row captures the actor, the action, the before/after state, and the timestamp. The hash chain makes the sequence tamper-evident.

### Session-level events (separate from control-plane audit)

Session events — individual tool calls, message exchanges, policy decisions during agent execution — are recorded in the session table and streaming log, not in the control-plane audit. They're operational telemetry, not governance mutations.

Session records track:
- Event count
- MCP tool use count
- Duration
- Degradation flags (silent MCP fallback detection)
- Terminal status

### Super-admin actions (operations audit)

Actions taken by Powerloom staff on the operations console:
- Who logged in (email, IP, user-agent)
- Which organization they viewed
- What lifecycle action they took (tier change, trial extension, suspension, deletion)
- Before/after state for every change

No customer data in super-admin audit entries. The operations console has no access to customer users, agents, skills, sessions, or credentials — by design.

---

## How Approval Gates Interact with Auth and Audit

Approval gates sit at the intersection of authentication and audit. They enforce separation of duties:

1. **Authenticated user** attempts a mutation (e.g., create an agent in production)
2. **Approval policy** matches (resource kind + action + OU scope)
3. Mutation is **deferred** — not applied. An `ApprovalRequest` row is created instead.
4. **Audit row** written: "request created by user X for action Y"
5. **Approver** (different user, with required role at the scope OU) reviews and decides
6. **Audit row** written: "request approved/rejected by user Z"
7. If approved: mutation **replays** atomically. **Audit row** written: "resource created"
8. If rejected or expired: mutation abandoned. **Audit row** written: "request rejected/auto_denied"

**Hard constraint:** The requester cannot approve their own request. Enforced regardless of role. This closes the SOC 2 separation-of-duties control.

**Hierarchical delegation:** An approver must hold the required role at the resource's OU or any ancestor OU. An OrgAdmin at the root can approve anything. An OUAdmin at `/acme/engineering` can approve within that subtree but not outside it.

---

## What This Means for Compliance

### SOC 2 Type II

The audit trail satisfies several SOC 2 Trust Services Criteria:

- **CC6.1 (Logical access):** RBAC with deny-override, documented in every binding creation/deletion audit row
- **CC6.2 (Authentication):** MFA, password hashing, session management — all auditable
- **CC7.2 (Monitoring):** Append-only audit trail with integrity verification
- **CC8.1 (Change management):** Approval gates with separation of duties, full before/after capture

### ISO 27001

- **A.9 Access control:** Full RBAC model with audit trail
- **A.12.4 Logging and monitoring:** Hash-chained, tamper-evident, retention-managed
- **A.14.2 Change management:** Policy-declared approval gates

### HIPAA (conditional)

For organizations handling PHI:
- **164.312(b) Audit controls:** Append-only, cryptographically verifiable audit trail
- **164.312(d) Authentication:** MFA, credential encryption, session-delegated JWTs
- **164.312(c) Integrity:** Hash chain detects unauthorized modifications

---

## The Philosophy

Powerloom's auth and audit design follows one principle: **trust is earned by architecture, not by policy.**

Policy says "we don't access customer data." Architecture says "the capability to access customer data doesn't exist." The super-admin console has no read path to customer resources. The meta-agent's permissions are bounded by the admin's JWT. The audit trail is immutable below the application layer.

This isn't defense in depth — it's defense by absence. The attack surface for the most dangerous scenarios (data exfiltration, audit tampering, privilege escalation) is minimized not by adding controls, but by never building the pathways in the first place.
