
# LLM Content Policy

The LLM proxy includes a content security pipeline that inspects all agent LLM traffic for policy violations. Content inspection is a platform capability that Daevix operates: it runs inside the managed enclave and LLM proxy, and policy is layered across **Platform > Org > Agent** scopes. Operators configure policies with the `dvx policy` and `dvx config` CLI using this 3-tier hierarchy.

The pipeline runs entirely on the enclave host - data never leaves the customer's infrastructure.

```
Request flow:
  Agent → JWT auth → Read body → [Content Pipeline] → Resolve API key → Forward upstream
                                   ├─ Model restriction (block)
                                   ├─ Secrets filter (block)
                                   ├─ PII/API key/regex patterns (block or warn)
                                   └─ Sidecar inspectors (block or async)

Response flow:
  Upstream → Buffer response → [Tool policy] → [Content Pipeline] → Audit → Replay
                                                 ├─ Secrets filter (block)
                                                 ├─ PII/API key/regex patterns (block or warn)
                                                 └─ Sidecar inspectors (async)
```

## Policy Hierarchy

Policies are resolved across three scopes. Platform and org rows live in `service_config` with `service = "llmproxy"`; agent-scope rows live in `agent_config` with `override_service = "llmproxy"` (set with `dvx agent config set <agent> … --override-service llmproxy`).

| Scope | Set via | Description |
|-------|---------|-------------|
| **Platform** | `PUT /api/v1/platform/config/llmproxy/{key}` | Applies to all orgs and agents |
| **Org** | `PUT /api/v1/config/llmproxy/{key}` | Applies to all agents in the org |
| **Agent** | `POST /api/v1/agents/{id}/config` with `override_service: "llmproxy"` | Applies to a single agent |

### Resolution

For each policy key, the resolver looks up the value in order:

1. **Higher-scope lock check.** If a `service_config` row at org or platform scope has `locked = true`, that value is returned - agent overrides are ignored.
2. **Agent override.** If an `agent_config` row exists for `(override_service = "llmproxy", name = key, agent_id)`, it is returned.
3. **Org fallback.** Otherwise, the org-level `service_config` row (if any) is returned.
4. **Platform fallback.** Otherwise, the platform-level `service_config` row (if any) is returned.
5. **Code default.** Otherwise, the built-in default registered by the llmproxy process.

### Locked Floors

Setting `locked: true` on a platform or org `service_config` row prevents the agent scope from overriding that key. Use it when you need a value enforced across every agent in the scope and below.

For policy _values_ whose own JSON shape supports merging (for example, pattern lists), the resolver currently returns the winning row's value as-is - there is no automatic append/merge across scopes. If you need platform baselines plus org additions, keep the baselines at platform scope and avoid setting the same key at org scope, or express the union explicitly in the higher-scope value.

### Caching

Resolved policies are cached for 30 seconds per `(orgID, agentID)` pair. Changes to policy take effect within 30 seconds without proxy restarts.

## Inert by Default

The pipeline is **constructed unconditionally** whenever the proxy has a database connection, but it is **inert until policy is written**. With no `model_policy`, `content_inspection`, or `sidecar_inspection` key set at any tier for an org/agent, the proxy builds an empty inspector list and request/response inspection is a no-op - zero behavioral change. This makes enabling inspection safe-by-default: nothing happens until an operator authors policy via the keys below.

## Global Kill-Switch (`inspect:llm`)

A per-organization **feature flag** named `inspect:llm` gates the entire content-inspection pipeline. It is **default-enabled**: if no flag row exists for the org, inspection runs as configured.

Disabling it skips **all** inspection for that org - model restriction, content inspectors, and sidecars alike - even when policy is configured. The check is evaluated per request (the flag is org-scoped, while interceptor registration is process-global), so toggling it takes effect without a proxy restart. Use it as an org-wide off-switch without having to delete every policy key.

```bash
# Disable all LLM content inspection for an org
curl -X PUT https://controlplane:8443/api/v1/feature-flags/inspect:llm \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'
```

This mirrors the `audit:llm` flag that gates LLM audit logging - both run through the same default-enabled, DB-backed feature checker.

## Failure Posture

The inspection pipeline runs inline on every LLM call, so its failure behavior is deliberately **fail-open by default**: a broken inspector, a transient database error, or a slow sidecar produces no findings and the request proceeds. This matches the sidecar fail-open behavior (see [Error Handling](#error-handling) below) and prevents a single misbehaving inspector from taking down all LLM traffic for an org.

Two proxy flags make the posture explicit and bounded:

- **`--inspect-timeout`** (env `DVX_INSPECT_TIMEOUT`, default `2s`) - a deadline around the **entire** request-phase inspection run (the whole inspector pipeline, not just sidecar calls). If inspection exceeds it, the in-flight run is abandoned and the configured failure posture applies.
- **`--inspect-fail-closed`** (env `DVX_INSPECT_FAIL_CLOSED`, default `false`) - for high-assurance deployments. Selects what happens when request-path inspection times out or errors:
  - **Fail-open (default, `false`)** - the request proceeds with no findings, logged with the stable marker `inspection failopen`.
  - **Fail-closed (`true`)** - the request is rejected with a **503** carrying:
    ```json
    {
      "type": "error",
      "error": {
        "type": "content_inspection_unavailable",
        "message": "Request rejected: content security inspection is unavailable."
      }
    }
    ```

  Response-path inspection remains observe-only regardless of this setting.

The key safety signal under fail-open is the rate of silently-skipped inspections; operators running fail-open should monitor for the `inspection failopen` log marker (and inspector/sidecar error logs) and alert on a non-trivial rate.

## Policy Keys

Three policy keys are available under `service = "llmproxy"`:

- [`model_policy`](#model-policy) - restrict which LLM models agents can use
- [`content_inspection`](#content-inspection) - detect PII, API keys, secrets, and custom patterns
- [`sidecar_inspection`](#sidecar-inspection) - call external HTTP services for additional inspection

## Model Policy

Restricts which models an agent can request. Request-phase only - the proxy extracts the `model` field from the request JSON body and checks it against the policy.

```json
{
  "immutable": false,
  "mode": "allowlist",
  "models": ["claude-sonnet-4-*", "claude-haiku-*"]
}
```

| Field | Type | Description |
|-------|------|-------------|
| `immutable` | bool | If true, lower tiers cannot override this policy |
| `mode` | string | `"allowlist"` (only listed models allowed) or `"blocklist"` (listed models denied) |
| `models` | string[] | Glob patterns matched against model identifiers (uses `path.Match` syntax) |

### Examples

Allow only Claude models:

```bash
curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/model_policy \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"mode\":\"allowlist\",\"models\":[\"claude-*\"]}"}'
```

Block GPT models at the platform level (immutable):

```bash
curl -X PUT https://controlplane:8443/api/v1/platform-config/llmproxy/model_policy \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"immutable\":true,\"mode\":\"blocklist\",\"models\":[\"gpt-*\",\"o1-*\",\"o3-*\"]}"}'
```

### `model_policy` vs Execution-Policy `model_match`

The platform has two ways to constrain which models an agent may use, and they are **complementary** - do not configure both for the same constraint:

- **`model_policy`** (this key) - the pack-composable allow/blocklist enforced by the content pipeline. Model identifiers are matched as `path.Match` glob patterns. Choose this when you want [policy-pack composition](#policy-pack-composition): a higher tier's base policy and every installed pack's model contribution compose by logical **AND**, so a pack can only ever *narrow* the permitted set, never relax it.
- **`model_match`** in an **execution policy** - a condition in the broader DB-driven execution-policy rule engine. Choose this when the model constraint is part of a larger execution rule (combined with time windows or other conditions) and pack composition is not needed.

Express a model limit through `model_policy` when pack composition is desired, and through an execution policy otherwise. Configuring both for the same constraint produces overlapping, redundant enforcement.

When a request uses a disallowed model, the proxy returns a 403 with:

```json
{
  "type": "error",
  "error": {
    "type": "content_policy_violation",
    "message": "Request blocked by content security policy."
  }
}
```

## Content Inspection

Scans request and response bodies for sensitive content. All built-in inspectors run in parallel.

```json
{
  "immutable": false,
  "secrets_filter": { "enabled": true, "severity": "block" },
  "pii_detection": { "enabled": true, "severity": "block", "types": ["email", "credit_card", "ssn"] },
  "api_key_detection": { "enabled": true, "severity": "block" },
  "patterns": [
    { "pattern": "INTERNAL_PROJECT_.*", "description": "Internal codename", "severity": "block" }
  ]
}
```

### Secrets Filter

Detects leakage of the agent's own secrets (managed with [`dvx secret`](/cli-secrets/)). The proxy decrypts the agent's secrets and performs substring scanning against the request/response body.

- Only secrets with values >= 8 characters are scanned (shorter values produce too many false positives)
- Matches are redacted in findings (first 4 characters + `****`)

### PII Detection

Detects personally identifiable information using regex patterns:

| Type | What it detects |
|------|-----------------|
| `email` | Email addresses |
| `credit_card` | Credit card numbers (with Luhn checksum validation) |
| `ssn` | US Social Security Numbers (XXX-XX-XXXX format) |

Use the `types` array to limit which PII types are detected. If omitted, all types are checked.

### API Key Detection

Detects common API key formats:

| Provider | Pattern prefix |
|----------|---------------|
| AWS | `AKIA` |
| GitHub | `ghp_`, `ghs_`, `github_pat_` |
| Anthropic | `sk-ant-` |
| Google Cloud | `AIza` |
| OpenAI | `sk-` |
| Stripe | `sk_test_`, `sk_live_`, `pk_test_`, `pk_live_` |

### Custom Regex Patterns

Add organization-specific patterns for detecting proprietary terms, internal identifiers, or any text matching a regular expression. Patterns from all tiers are additive - lower tiers can add patterns but cannot remove patterns set by higher tiers.

```json
{
  "patterns": [
    { "pattern": "PROJECT_(ALPHA|BETA)_\\d+", "description": "Internal project code", "severity": "block" },
    { "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "description": "SSN-like number", "severity": "warn" }
  ]
}
```

Invalid regex patterns are logged and skipped (they do not cause errors).

> **Caveat - inspectors see the post-injection request body.** Request inspection runs **after** the proxy's policy-context interceptor, which injects a platform-generated security-context prefix into the request's system prompt. The intent is to scan exactly what leaves the proxy, so the inspector sees the final outbound body including that injected text. The injected prefix is platform-generated and trusted, so it is unlikely to trip the built-in secrets/PII/API-key inspectors - but a **custom regex pattern can unintentionally match the injected prefix and produce false positives**. Test custom patterns against bodies that include the injected context, not just the agent's raw prompt.

### Severity Levels

Each inspector has a configurable severity that determines what happens when a match is found:

| Severity | Effect |
|----------|--------|
| `log` | Record the finding in the audit log. Request proceeds normally. |
| `warn` | Record the finding prominently. Request proceeds normally. |
| `block` | On the **request** path, reject with a 403 response. On the **response** path, observe-only (see below). Finding recorded in audit log either way. |

The `redact` severity is reserved for future use and currently behaves like `warn`.

#### Request vs Response Enforcement

Enforcement differs by phase:

- **Request phase** - a `block`-severity finding short-circuits the request with a **403** carrying the generic `content_policy_violation` envelope (shown under [Model Policy](#model-policy)). The redacted match and inspector type appear only in the audit log and proxy logs - they are never leaked to the agent.
- **Response phase** - inspection is **observe-only** for all severities, including `block`. A `block` finding on a completion is recorded (and alertable) but does **not** reject the response: the model has already produced (and billed) it, and the response is replayed to the agent. This is a deliberate divergence from the network proxy, which can block on the response path.

Separately, when `--inspect-fail-closed` is enabled, an inspection error or timeout on the request path returns a **503** (inspection unavailable) rather than a 403 - a 403 means a policy *match*, a 503 means inspection *could not run*.

## Sidecar Inspection

Route LLM traffic to external HTTP services for additional inspection. Sidecars can implement custom logic like NER-based PII detection (Presidio), toxicity classification, or LLM-as-judge evaluation.

```json
{
  "immutable": false,
  "sidecars": [
    {
      "name": "presidio-ner",
      "url": "http://presidio.daevix.svc:8080",
      "timeout_ms": 5000,
      "on_request": true,
      "on_response": true,
      "async": false,
      "severity": "block"
    },
    {
      "name": "llm-judge",
      "url": "http://llm-judge.daevix.svc:8081",
      "timeout_ms": 30000,
      "on_request": true,
      "on_response": false,
      "async": true,
      "severity": "warn",
      "include_context": true
    }
  ]
}
```

| Field | Type | Default | Description |
|-------|------|---------|-------------|
| `name` | string | (required) | Identifier for the sidecar (used in finding reports and logs) |
| `url` | string | (required) | HTTP endpoint the proxy POSTs to |
| `timeout_ms` | int | 5000 | Request timeout in milliseconds |
| `on_request` | bool | false | Run on outgoing LLM requests |
| `on_response` | bool | false | Run on incoming LLM responses |
| `async` | bool | false | Fire-and-forget mode (findings logged but don't block) |
| `severity` | string | `"warn"` | Severity applied to returned findings |
| `include_context` | bool | false | Include recent conversation messages in the sidecar request |

Sidecars from all tiers are additive - lower tiers add sidecars, they cannot remove sidecars defined by higher tiers.

### Sidecar Protocol

The proxy POSTs a JSON request to the sidecar URL:

```json
{
  "agent_id": 42,
  "organization_id": 1,
  "agent_name": "coding-agent",
  "phase": "request",
  "body": "<raw LLM API request/response JSON>",
  "context": {
    "recent_messages": [{"role": "user", "content": "..."}]
  }
}
```

The `context` field is only populated when `include_context` is `true`. It contains the `messages` array extracted from the LLM request body, intended for LLM-as-judge sidecars that need conversation history.

The sidecar responds with:

```json
{
  "findings": [
    {
      "description": "Toxic content detected",
      "match": "offensive phrase",
      "severity": "block"
    }
  ]
}
```

### Error Handling

All sidecars **fail open**:

- Network errors or timeouts produce no findings (request proceeds)
- Non-200 HTTP responses produce no findings
- Malformed JSON responses produce no findings
- Response bodies are limited to 1 MB

Errors are logged for operator monitoring but never block LLM traffic. Operators who need fail-closed behavior should monitor sidecar error logs and alert accordingly.

### Async Mode

When `async` is `true`, the sidecar request fires in the background. The LLM request proceeds immediately without waiting for the sidecar response. Any findings are logged to the audit log but cannot block the request.

Use async mode for high-latency inspectors (e.g., LLM-as-judge) where blocking would add unacceptable latency.

## Policy Pack Composition

When policy packs are installed, the resolver composes their `content_inspection` and `model` contributions on top of the operator-authored base at read time, behind the same 30s cache. Composition is **monotonic toward stricter**:

- **Content inspection** contributions fold in as additional layers - a pack can enable a toggle, add PII types, or add regex patterns, but can never disable an inspector the base enabled.
- **Model** contributions compose by logical **AND**: a model is permitted only if the resolved base policy **and every** pack contribution permits it. A pack can only narrow the permitted model set, never relax the operator base.

No separate enablement step is required - installed packs take effect automatically alongside the `service_config` policy.

## Audit Integration

All findings from content inspectors are recorded in the `llm_audit_log` table as a `findings` JSONB column. Each finding includes:

| Field | Description |
|-------|-------------|
| `inspector_type` | Which inspector produced the finding (e.g., `pii`, `api_key`, `model_restriction`, `sidecar:presidio-ner`) |
| `severity` | `log`, `warn`, or `block` |
| `description` | Human-readable description of what was detected |
| `match` | The matched content (redacted: first 4 characters + `****`) |
| `location` | Where the match was found: `request_body`, `response_body`, `model`, or `sidecar` |

Query audit logs with findings via `GET /api/v1/agents/{id}/audit-log`. Each entry includes the `findings` array when content policy violations were detected.

## Configuration Examples

### Minimal: Model Restriction Only

Restrict an org to Claude models:

```bash
curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/model_policy \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"mode\":\"allowlist\",\"models\":[\"claude-*\"]}"}'
```

### Full: Platform Security Baseline

Set an immutable platform baseline with PII and API key detection:

```bash
# Platform-level content inspection (immutable)
curl -X PUT https://controlplane:8443/api/v1/platform-config/llmproxy/content_inspection \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"immutable\":true,\"pii_detection\":{\"enabled\":true,\"severity\":\"block\"},\"api_key_detection\":{\"enabled\":true,\"severity\":\"block\"}}"}'
```

Orgs can add custom patterns on top of this baseline but cannot disable PII or API key detection:

```bash
# Org adds custom patterns (additive)
curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/content_inspection \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"patterns\":[{\"pattern\":\"CONFIDENTIAL_.*\",\"description\":\"Confidential marker\",\"severity\":\"block\"}]}"}'
```

### Sidecar: Presidio NER + LLM Judge

```bash
curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/sidecar_inspection \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"sidecars\":[{\"name\":\"presidio\",\"url\":\"http://presidio.daevix.svc:8080\",\"timeout_ms\":5000,\"on_request\":true,\"on_response\":true,\"severity\":\"block\"},{\"name\":\"llm-judge\",\"url\":\"http://llm-judge.daevix.svc:8081\",\"timeout_ms\":30000,\"on_request\":true,\"async\":true,\"severity\":\"warn\",\"include_context\":true}]}"}'
```
