Skip to content

Content Gates

Content Gates let you configure custom content moderation rules that run on all chat inputs and outputs. They are an extra layer on top of the platform's built-in safety system — your rules cannot weaken system safety, only add to it.

How It Works

Every chat message passes through your rules in this order:

  1. System safety (always active, not configurable) — blocks CSAM, violence, hate speech
  2. Input validation (built-in) — detects prompt injection, PII
  3. Your input gates — blacklist and regex rules you configure
  4. AI generates response
  5. Output validation (built-in) — redacts PII, detects prompt leakage
  6. Your output gates — blacklist and regex rules you configure

Rules are evaluated cheapest-first. If a blacklist rule catches something, regex rules don't need to run.

Rule Types

Blacklist

Exact word or phrase matching. Case-insensitive by default. Enter one word/phrase per line.

Example: Block competitor mentions

CompetitorA
CompetitorB
their product name

Regex

Regular expression pattern matching. Use standard regex syntax.

Example: Block credit card numbers

\b\d{4}[\s.-]?\d{4}[\s.-]?\d{4}[\s.-]?\d{4}\b

Actions

ActionBehavior
BlockMessage is rejected entirely. User sees a generic error.
WarnMessage passes through, but the hit is logged in the audit dashboard.
RedactMatched text is replaced with [REDACTED].
ReplaceMatched text is replaced with your custom replacement text.

Direction

Each rule can apply to:

  • Input — only checks user messages
  • Output — only checks AI responses
  • Both — checks both directions

Scope

  • Org-wide (default) — rule applies to all widgets in your organization
  • Widget-specific — rule only applies to a specific widget (set via API)

Templates

Pre-built rule templates are available for common use cases:

  • PII Protection — email, phone, credit card, SSN, IBAN detection
  • Competitor Blocker — block competitor name mentions (customize the list)
  • Profanity Filter — basic profanity blocking
  • Prompt Injection — extra prompt injection patterns

Install templates from the Content Gates page and customize them to your needs.

Audit Dashboard

The audit dashboard shows:

  • 30-day summary — total blocks, warnings, and redactions
  • Daily trend — gate trigger volume over the last 14 days
  • Top rules — which rules are triggering most often

Use the audit dashboard to fine-tune your rules and understand what content your gates are catching.

Testing Rules

Before enabling a rule, use the built-in test panel in the create/edit modal. Enter sample content and see if your pattern matches — no need to deploy first.

API

GET    /api/content-gates              # List all rules
POST   /api/content-gates              # Create a rule
GET    /api/content-gates/{id}         # Get a rule
PUT    /api/content-gates/{id}         # Update a rule
DELETE /api/content-gates/{id}         # Delete a rule
POST   /api/content-gates/{id}/toggle  # Enable/disable a rule
POST   /api/content-gates/test         # Test a pattern against content
GET    /api/content-gates/templates    # List available templates
POST   /api/content-gates/templates/{slug}  # Install a template
GET    /api/content-gates/audit        # Audit dashboard data

Limitations (v1)

  • No LLM judge — v1 supports blacklist and regex only. An AI-powered judge layer (e.g., "reject if the response contains medical advice") is planned for v2.
  • Chat only — gates currently run on widget chat. Content pipeline and playbook AI steps are not covered yet.
  • No per-conversation overrides — rules apply at org or widget level, not per conversation.
  • No bulk import/export — rules must be created individually (templates help for common patterns).
  • No regex timeout protection — complex regex patterns could theoretically cause slowdowns. Keep patterns simple.
  • No rule versioning — changes take effect immediately with no rollback. Test thoroughly before enabling.

Future Roadmap

  • LLM Judge rules (AI-powered content classification)
  • Content pipeline and playbook integration
  • Rule versioning and rollback
  • Webhook notifications on gate triggers
  • Widget settings integration (show per-widget overrides inline)
  • Bulk import/export of rules
  • Regex DoS protection (RE2 mode)