LLM Content Policy

The LLM proxy includes a content security pipeline that inspects all agent LLM traffic for policy violations. Content inspection is a platform capability that Daevix operates: it runs inside the managed enclave and LLM proxy, and policy is layered across Platform > Org > Agent scopes. Operators configure policies with the dvx policy and dvx config CLI using this 3-tier hierarchy.

The pipeline runs entirely on the enclave host - data never leaves the customer’s infrastructure.

Request flow:
  Agent → JWT auth → Read body → [Content Pipeline] → Resolve API key → Forward upstream
                                   ├─ Model restriction (block)
                                   ├─ Secrets filter (block)
                                   ├─ PII/API key/regex patterns (block or warn)
                                   └─ Sidecar inspectors (block or async)

Response flow:
  Upstream → Buffer response → [Tool policy] → [Content Pipeline] → Audit → Replay
                                                 ├─ Secrets filter (block)
                                                 ├─ PII/API key/regex patterns (block or warn)
                                                 └─ Sidecar inspectors (async)

Policy Hierarchy

Policies are resolved across three scopes. Platform and org rows live in service_config with service = "llmproxy"; agent-scope rows live in agent_config with override_service = "llmproxy" (set with dvx agent config set <agent> … --override-service llmproxy).

Scope	Set via	Description
Platform	`PUT /api/v1/platform/config/llmproxy/{key}`	Applies to all orgs and agents
Org	`PUT /api/v1/config/llmproxy/{key}`	Applies to all agents in the org
Agent	`POST /api/v1/agents/{id}/config` with `override_service: "llmproxy"`	Applies to a single agent

Resolution

For each policy key, the resolver looks up the value in order:

Higher-scope lock check. If a service_config row at org or platform scope has locked = true, that value is returned - agent overrides are ignored.
Agent override. If an agent_config row exists for (override_service = "llmproxy", name = key, agent_id), it is returned.
Org fallback. Otherwise, the org-level service_config row (if any) is returned.
Platform fallback. Otherwise, the platform-level service_config row (if any) is returned.
Code default. Otherwise, the built-in default registered by the llmproxy process.

Locked Floors

Setting locked: true on a platform or org service_config row prevents the agent scope from overriding that key. Use it when you need a value enforced across every agent in the scope and below.

For policy values whose own JSON shape supports merging (for example, pattern lists), the resolver currently returns the winning row’s value as-is - there is no automatic append/merge across scopes. If you need platform baselines plus org additions, keep the baselines at platform scope and avoid setting the same key at org scope, or express the union explicitly in the higher-scope value.

Caching

Resolved policies are cached for 30 seconds per (orgID, agentID) pair. Changes to policy take effect within 30 seconds without proxy restarts.

Inert by Default

The pipeline is constructed unconditionally whenever the proxy has a database connection, but it is inert until policy is written. With no model_policy, content_inspection, or sidecar_inspection key set at any tier for an org/agent, the proxy builds an empty inspector list and request/response inspection is a no-op - zero behavioral change. This makes enabling inspection safe-by-default: nothing happens until an operator authors policy via the keys below.

Global Kill-Switch (`inspect:llm`)

A per-organization feature flag named inspect:llm gates the entire content-inspection pipeline. It is default-enabled: if no flag row exists for the org, inspection runs as configured.

Disabling it skips all inspection for that org - model restriction, content inspectors, and sidecars alike - even when policy is configured. The check is evaluated per request (the flag is org-scoped, while interceptor registration is process-global), so toggling it takes effect without a proxy restart. Use it as an org-wide off-switch without having to delete every policy key.

# Disable all LLM content inspection for an org
curl -X PUT https://controlplane:8443/api/v1/feature-flags/inspect:llm \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"enabled": false}'

This mirrors the audit:llm flag that gates LLM audit logging - both run through the same default-enabled, DB-backed feature checker.

Failure Posture

The inspection pipeline runs inline on every LLM call, so its failure behavior is deliberately fail-open by default: a broken inspector, a transient database error, or a slow sidecar produces no findings and the request proceeds. This matches the sidecar fail-open behavior (see Error Handling below) and prevents a single misbehaving inspector from taking down all LLM traffic for an org.

Two proxy flags make the posture explicit and bounded:

--inspect-timeout (env DVX_INSPECT_TIMEOUT, default 2s) - a deadline around the entire request-phase inspection run (the whole inspector pipeline, not just sidecar calls). If inspection exceeds it, the in-flight run is abandoned and the configured failure posture applies.
--inspect-fail-closed (env DVX_INSPECT_FAIL_CLOSED, default false) - for high-assurance deployments. Selects what happens when request-path inspection times out or errors:
- Fail-open (default, false) - the request proceeds with no findings, logged with the stable marker inspection failopen.
- Fail-closed (true) - the request is rejected with a 503 carrying:
```
{
  "type": "error",
  "error": {
    "type": "content_inspection_unavailable",
    "message": "Request rejected: content security inspection is unavailable."
  }
}
```
Response-path inspection remains observe-only regardless of this setting.

The key safety signal under fail-open is the rate of silently-skipped inspections; operators running fail-open should monitor for the inspection failopen log marker (and inspector/sidecar error logs) and alert on a non-trivial rate.

Policy Keys

Three policy keys are available under service = "llmproxy":

model_policy - restrict which LLM models agents can use
content_inspection - detect PII, API keys, secrets, and custom patterns
sidecar_inspection - call external HTTP services for additional inspection

Model Policy

Restricts which models an agent can request. Request-phase only - the proxy extracts the model field from the request JSON body and checks it against the policy.

{
  "immutable": false,
  "mode": "allowlist",
  "models": ["claude-sonnet-4-*", "claude-haiku-*"]
}

Field	Type	Description
`immutable`	bool	If true, lower tiers cannot override this policy
`mode`	string	`"allowlist"` (only listed models allowed) or `"blocklist"` (listed models denied)
`models`	string[]	Glob patterns matched against model identifiers (uses `path.Match` syntax)

Examples

Allow only Claude models:

curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/model_policy \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"mode\":\"allowlist\",\"models\":[\"claude-*\"]}"}'

Block GPT models at the platform level (immutable):

curl -X PUT https://controlplane:8443/api/v1/platform-config/llmproxy/model_policy \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"immutable\":true,\"mode\":\"blocklist\",\"models\":[\"gpt-*\",\"o1-*\",\"o3-*\"]}"}'

`model_policy` vs Execution-Policy `model_match`

The platform has two ways to constrain which models an agent may use, and they are complementary - do not configure both for the same constraint:

model_policy (this key) - the pack-composable allow/blocklist enforced by the content pipeline. Model identifiers are matched as path.Match glob patterns. Choose this when you want policy-pack composition: a higher tier’s base policy and every installed pack’s model contribution compose by logical AND, so a pack can only ever narrow the permitted set, never relax it.
model_match in an execution policy - a condition in the broader DB-driven execution-policy rule engine. Choose this when the model constraint is part of a larger execution rule (combined with time windows or other conditions) and pack composition is not needed.

Express a model limit through model_policy when pack composition is desired, and through an execution policy otherwise. Configuring both for the same constraint produces overlapping, redundant enforcement.

When a request uses a disallowed model, the proxy returns a 403 with:

{
  "type": "error",
  "error": {
    "type": "content_policy_violation",
    "message": "Request blocked by content security policy."
  }
}

Content Inspection

Scans request and response bodies for sensitive content. All built-in inspectors run in parallel.

{
  "immutable": false,
  "secrets_filter": { "enabled": true, "severity": "block" },
  "pii_detection": { "enabled": true, "severity": "block", "types": ["email", "credit_card", "ssn"] },
  "api_key_detection": { "enabled": true, "severity": "block" },
  "patterns": [
    { "pattern": "INTERNAL_PROJECT_.*", "description": "Internal codename", "severity": "block" }
  ]
}

Secrets Filter

Detects leakage of the agent’s own secrets (managed with dvx secret). The proxy decrypts the agent’s secrets and performs substring scanning against the request/response body.

Only secrets with values >= 8 characters are scanned (shorter values produce too many false positives)
Matches are redacted in findings (first 4 characters + ****)

PII Detection

Detects personally identifiable information using regex patterns:

Type	What it detects
`email`	Email addresses
`credit_card`	Credit card numbers (with Luhn checksum validation)
`ssn`	US Social Security Numbers (XXX-XX-XXXX format)

Use the types array to limit which PII types are detected. If omitted, all types are checked.

API Key Detection

Detects common API key formats:

Provider	Pattern prefix
AWS	`AKIA`
GitHub	`ghp_`, `ghs_`, `github_pat_`
Anthropic	`sk-ant-`
Google Cloud	`AIza`
OpenAI	`sk-`
Stripe	`sk_test_`, `sk_live_`, `pk_test_`, `pk_live_`

Custom Regex Patterns

Add organization-specific patterns for detecting proprietary terms, internal identifiers, or any text matching a regular expression. Patterns from all tiers are additive - lower tiers can add patterns but cannot remove patterns set by higher tiers.

{
  "patterns": [
    { "pattern": "PROJECT_(ALPHA|BETA)_\\d+", "description": "Internal project code", "severity": "block" },
    { "pattern": "\\b\\d{3}-\\d{2}-\\d{4}\\b", "description": "SSN-like number", "severity": "warn" }
  ]
}

Invalid regex patterns are logged and skipped (they do not cause errors).

Caveat - inspectors see the post-injection request body. Request inspection runs after the proxy’s policy-context interceptor, which injects a platform-generated security-context prefix into the request’s system prompt. The intent is to scan exactly what leaves the proxy, so the inspector sees the final outbound body including that injected text. The injected prefix is platform-generated and trusted, so it is unlikely to trip the built-in secrets/PII/API-key inspectors - but a custom regex pattern can unintentionally match the injected prefix and produce false positives. Test custom patterns against bodies that include the injected context, not just the agent’s raw prompt.

Severity Levels

Each inspector has a configurable severity that determines what happens when a match is found:

Severity	Effect
`log`	Record the finding in the audit log. Request proceeds normally.
`warn`	Record the finding prominently. Request proceeds normally.
`block`	On the request path, reject with a 403 response. On the response path, observe-only (see below). Finding recorded in audit log either way.

The redact severity is reserved for future use and currently behaves like warn.

Request vs Response Enforcement

Enforcement differs by phase:

Request phase - a block-severity finding short-circuits the request with a 403 carrying the generic content_policy_violation envelope (shown under Model Policy). The redacted match and inspector type appear only in the audit log and proxy logs - they are never leaked to the agent.
Response phase - inspection is observe-only for all severities, including block. A block finding on a completion is recorded (and alertable) but does not reject the response: the model has already produced (and billed) it, and the response is replayed to the agent. This is a deliberate divergence from the network proxy, which can block on the response path.

Separately, when --inspect-fail-closed is enabled, an inspection error or timeout on the request path returns a 503 (inspection unavailable) rather than a 403 - a 403 means a policy match, a 503 means inspection could not run.

Sidecar Inspection

Route LLM traffic to external HTTP services for additional inspection. Sidecars can implement custom logic like NER-based PII detection (Presidio), toxicity classification, or LLM-as-judge evaluation.

{
  "immutable": false,
  "sidecars": [
    {
      "name": "presidio-ner",
      "url": "http://presidio.daevix.svc:8080",
      "timeout_ms": 5000,
      "on_request": true,
      "on_response": true,
      "async": false,
      "severity": "block"
    },
    {
      "name": "llm-judge",
      "url": "http://llm-judge.daevix.svc:8081",
      "timeout_ms": 30000,
      "on_request": true,
      "on_response": false,
      "async": true,
      "severity": "warn",
      "include_context": true
    }
  ]
}

Field	Type	Default	Description
`name`	string	(required)	Identifier for the sidecar (used in finding reports and logs)
`url`	string	(required)	HTTP endpoint the proxy POSTs to
`timeout_ms`	int	5000	Request timeout in milliseconds
`on_request`	bool	false	Run on outgoing LLM requests
`on_response`	bool	false	Run on incoming LLM responses
`async`	bool	false	Fire-and-forget mode (findings logged but don’t block)
`severity`	string	`"warn"`	Severity applied to returned findings
`include_context`	bool	false	Include recent conversation messages in the sidecar request

Sidecars from all tiers are additive - lower tiers add sidecars, they cannot remove sidecars defined by higher tiers.

Sidecar Protocol

The proxy POSTs a JSON request to the sidecar URL:

{
  "agent_id": 42,
  "organization_id": 1,
  "agent_name": "coding-agent",
  "phase": "request",
  "body": "<raw LLM API request/response JSON>",
  "context": {
    "recent_messages": [{"role": "user", "content": "..."}]
  }
}

The context field is only populated when include_context is true. It contains the messages array extracted from the LLM request body, intended for LLM-as-judge sidecars that need conversation history.

The sidecar responds with:

{
  "findings": [
    {
      "description": "Toxic content detected",
      "match": "offensive phrase",
      "severity": "block"
    }
  ]
}

Error Handling

All sidecars fail open:

Network errors or timeouts produce no findings (request proceeds)
Non-200 HTTP responses produce no findings
Malformed JSON responses produce no findings
Response bodies are limited to 1 MB

Errors are logged for operator monitoring but never block LLM traffic. Operators who need fail-closed behavior should monitor sidecar error logs and alert accordingly.

Async Mode

When async is true, the sidecar request fires in the background. The LLM request proceeds immediately without waiting for the sidecar response. Any findings are logged to the audit log but cannot block the request.

Use async mode for high-latency inspectors (e.g., LLM-as-judge) where blocking would add unacceptable latency.

Policy Pack Composition

When policy packs are installed, the resolver composes their content_inspection and model contributions on top of the operator-authored base at read time, behind the same 30s cache. Composition is monotonic toward stricter:

Content inspection contributions fold in as additional layers - a pack can enable a toggle, add PII types, or add regex patterns, but can never disable an inspector the base enabled.
Model contributions compose by logical AND: a model is permitted only if the resolved base policy and every pack contribution permits it. A pack can only narrow the permitted model set, never relax the operator base.

No separate enablement step is required - installed packs take effect automatically alongside the service_config policy.

Audit Integration

All findings from content inspectors are recorded in the llm_audit_log table as a findings JSONB column. Each finding includes:

Field	Description
`inspector_type`	Which inspector produced the finding (e.g., `pii`, `api_key`, `model_restriction`, `sidecar:presidio-ner`)
`severity`	`log`, `warn`, or `block`
`description`	Human-readable description of what was detected
`match`	The matched content (redacted: first 4 characters + `****`)
`location`	Where the match was found: `request_body`, `response_body`, `model`, or `sidecar`

Query audit logs with findings via GET /api/v1/agents/{id}/audit-log. Each entry includes the findings array when content policy violations were detected.

Configuration Examples

Minimal: Model Restriction Only

Restrict an org to Claude models:

curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/model_policy \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"mode\":\"allowlist\",\"models\":[\"claude-*\"]}"}'

Full: Platform Security Baseline

Set an immutable platform baseline with PII and API key detection:

# Platform-level content inspection (immutable)
curl -X PUT https://controlplane:8443/api/v1/platform-config/llmproxy/content_inspection \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"immutable\":true,\"pii_detection\":{\"enabled\":true,\"severity\":\"block\"},\"api_key_detection\":{\"enabled\":true,\"severity\":\"block\"}}"}'

Orgs can add custom patterns on top of this baseline but cannot disable PII or API key detection:

# Org adds custom patterns (additive)
curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/content_inspection \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"patterns\":[{\"pattern\":\"CONFIDENTIAL_.*\",\"description\":\"Confidential marker\",\"severity\":\"block\"}]}"}'

Sidecar: Presidio NER + LLM Judge

curl -X PUT https://controlplane:8443/api/v1/service-config/llmproxy/sidecar_inspection \
  -H "Authorization: Bearer $OPERATOR_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"value": "{\"sidecars\":[{\"name\":\"presidio\",\"url\":\"http://presidio.daevix.svc:8080\",\"timeout_ms\":5000,\"on_request\":true,\"on_response\":true,\"severity\":\"block\"},{\"name\":\"llm-judge\",\"url\":\"http://llm-judge.daevix.svc:8081\",\"timeout_ms\":30000,\"on_request\":true,\"async\":true,\"severity\":\"warn\",\"include_context\":true}]}"}'

LLM Content Policy

LLM Content Policy

Policy Hierarchy

Resolution

Locked Floors

Caching

Inert by Default

Global Kill-Switch (inspect:llm)

Failure Posture

Policy Keys

Model Policy

Examples

model_policy vs Execution-Policy model_match

Content Inspection

Secrets Filter

PII Detection

API Key Detection

Custom Regex Patterns

Severity Levels

Request vs Response Enforcement

Sidecar Inspection

Sidecar Protocol

Error Handling

Async Mode

Policy Pack Composition

Audit Integration

Configuration Examples

Minimal: Model Restriction Only

Full: Platform Security Baseline

Sidecar: Presidio NER + LLM Judge

Global Kill-Switch (`inspect:llm`)

`model_policy` vs Execution-Policy `model_match`