Content Gates

Content Gates let you configure custom content moderation rules that run on all chat inputs and outputs. They are an extra layer on top of the platform's built-in safety system — your rules cannot weaken system safety, only add to it.

How It Works

Every chat message passes through your rules in this order:

System safety (always active, not configurable) — blocks CSAM, violence, hate speech
Input validation (built-in) — detects prompt injection, PII
Your input gates — blacklist and regex rules you configure
AI generates response
Output validation (built-in) — redacts PII, detects prompt leakage
Your output gates — blacklist and regex rules you configure

Rules are evaluated cheapest-first. If a blacklist rule catches something, regex rules don't need to run.

Rule Types

Blacklist

Exact word or phrase matching. Case-insensitive by default. Enter one word/phrase per line.

Example: Block competitor mentions

CompetitorA
CompetitorB
their product name

Regex

Regular expression pattern matching. Use standard regex syntax.

Example: Block credit card numbers

\b\d{4}[\s.-]?\d{4}[\s.-]?\d{4}[\s.-]?\d{4}\b

Actions

Action	Behavior
Block	Message is rejected entirely. User sees a generic error.
Warn	Message passes through, but the hit is logged in the audit dashboard.
Redact	Matched text is replaced with `[REDACTED]`.
Replace	Matched text is replaced with your custom replacement text.

Direction

Each rule can apply to:

Input — only checks user messages
Output — only checks AI responses
Both — checks both directions

Scope

Org-wide (default) — rule applies to all widgets in your organization
Widget-specific — rule only applies to a specific widget (set via API)

Templates

Pre-built rule templates are available for common use cases:

PII Protection — email, phone, credit card, SSN, IBAN detection
Competitor Blocker — block competitor name mentions (customize the list)
Profanity Filter — basic profanity blocking
Prompt Injection — extra prompt injection patterns

Install templates from the Content Gates page and customize them to your needs.

Audit Dashboard

The audit dashboard shows:

30-day summary — total blocks, warnings, and redactions
Daily trend — gate trigger volume over the last 14 days
Top rules — which rules are triggering most often

Use the audit dashboard to fine-tune your rules and understand what content your gates are catching.

Testing Rules

Before enabling a rule, use the built-in test panel in the create/edit modal. Enter sample content and see if your pattern matches — no need to deploy first.

API

GET    /api/content-gates              # List all rules
POST   /api/content-gates              # Create a rule
GET    /api/content-gates/{id}         # Get a rule
PUT    /api/content-gates/{id}         # Update a rule
DELETE /api/content-gates/{id}         # Delete a rule
POST   /api/content-gates/{id}/toggle  # Enable/disable a rule
POST   /api/content-gates/test         # Test a pattern against content
GET    /api/content-gates/templates    # List available templates
POST   /api/content-gates/templates/{slug}  # Install a template
GET    /api/content-gates/audit        # Audit dashboard data

Limitations (v1)

No LLM judge — v1 supports blacklist and regex only. An AI-powered judge layer (e.g., "reject if the response contains medical advice") is planned for v2.
Chat only — gates currently run on widget chat. Content pipeline and playbook AI steps are not covered yet.
No per-conversation overrides — rules apply at org or widget level, not per conversation.
No bulk import/export — rules must be created individually (templates help for common patterns).
No regex timeout protection — complex regex patterns could theoretically cause slowdowns. Keep patterns simple.
No rule versioning — changes take effect immediately with no rollback. Test thoroughly before enabling.

Future Roadmap

LLM Judge rules (AI-powered content classification)
Content pipeline and playbook integration
Rule versioning and rollback
Webhook notifications on gate triggers
Widget settings integration (show per-widget overrides inline)
Bulk import/export of rules
Regex DoS protection (RE2 mode)

Content Gates ​

How It Works ​

Rule Types ​

Blacklist ​

Regex ​

Actions ​

Direction ​

Scope ​

Templates ​

Audit Dashboard ​

Testing Rules ​

API ​

Limitations (v1) ​

Future Roadmap ​