Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.continum.co/llms.txt

Use this file to discover all available pages before exploring further.

The Latency Problem

Traditional compliance solutions add 2-5 seconds to every LLM call:
User Request → Compliance Check (2-5s) → LLM Call (500ms) → Response
Total: 2.5-5.5 seconds 😞
This is unacceptable for production applications where users expect instant responses.

Continum’s Solution

Continum inverts the flow - compliance monitoring runs after the user already has their answer:
User Request → LLM Call (500ms) → Response ⚡

            [Async Monitoring] (2-5s, user doesn't wait)

How It Works

1. Direct Execution

The SDK calls your LLM provider directly using your API keys:
import { protect } from '@continum/sdk';
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY  // Stays on your server
});

// This calls OpenAI directly, no proxy
const response = await protect(
  () => openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello' }]
  }),
  {
    apiKey: process.env.CONTINUM_API_KEY!,
    preset: 'customer-support'
  }
);
// Response in ~500ms, same as direct OpenAI call
Key insight: Continum never sits between you and the LLM provider.

2. Async Monitoring

After returning the response, the SDK sends interaction details for monitoring:
// User already has response here ✅

// SDK does this in background (not awaited):
// Sends interaction details to Continum for compliance monitoring

3. Compliance Processing

Continum monitors your interaction asynchronously:
Receive interaction details

Analyze for compliance violations

Generate compliance signal

Store as evidence

Appears in dashboard
Total time: 2-5 seconds, but your user doesn’t wait for this.

Performance Comparison

ApproachUser LatencyCompliance DelayProduction Ready
Blocking compliance2.5-5.5s0s (inline)❌ Too slow
No compliance500ms∞ (never)❌ Risky
Continum500ms2-5s (async)✅ Best of both

Guardian: Fast Pre-LLM Protection

For cases where you need pre-LLM protection (e.g., blocking PII before it reaches the LLM), Continum offers Guardian:
const continum = new Continum({
  continumKey: process.env.CONTINUM_KEY,
  openaiKey: process.env.OPENAI_API_KEY,
  guardianEnabled: true  // Enable pre-LLM PII detection
});

const response = await continum.llm.openai.gpt_4o.chat({
  messages: [{ role: 'user', content: 'My email is john@example.com' }],
  sandbox: 'your-sandbox-slug'
});
// Guardian detects PII in < 100ms
// Redacts before sending to OpenAI
// Total latency: ~600ms (still acceptable)
Guardian uses fast pattern matching and detection models to identify PII in under 100ms.

Unified Flow

Continum combines Guardian (pre-LLM) and async monitoring (post-LLM) in one seamless flow:
1. Guardian Check (< 100ms)

2. Direct LLM Call (~500ms)

3. Return Response to User ⚡

4. Async Monitoring (2-5s, background)
Total user-facing latency: ~600ms (vs 2.5-5.5s with blocking compliance)

Real-World Example

import { Continum } from '@continum/sdk';

const continum = new Continum({
  continumKey: process.env.CONTINUM_KEY,
  openaiKey: process.env.OPENAI_API_KEY,
  guardianConfig: {
    enabled: true,
    action: 'REDACT_AND_CONTINUE'
  }
});

// User sends message with PII
const userMessage = 'My SSN is 123-45-6789 and email is john@example.com';

const start = Date.now();

const response = await continum.llm.openai.gpt_4o.chat({
  messages: [{ role: 'user', content: userMessage }],
  sandbox: 'your-sandbox-slug'
});

const latency = Date.now() - start;
console.log(`User latency: ${latency}ms`); // ~600ms

console.log(response.content); // User sees response immediately

// Meanwhile, in the background:
// - Guardian detected SSN and email (< 100ms)
// - Redacted before sending to OpenAI
// - Async monitoring running in background (2-5s)
// - Compliance signal will appear in dashboard shortly

Why This Matters

For Users

  • Instant responses (no waiting for compliance)
  • Same experience as direct LLM calls
  • No degraded performance

For Developers

  • Drop-in replacement for existing LLM calls
  • No architecture changes required
  • Keep your API keys on your server

For Compliance Teams

  • 100% coverage of LLM interactions
  • Real-time dashboard monitoring
  • Audit-ready evidence for regulations

Trade-offs

What You Get

✅ Zero added latency for users
✅ 100% compliance coverage
✅ Real-time monitoring
✅ Privacy-first architecture

What You Accept

⚠️ Compliance results appear 2-5s after response (not inline)
⚠️ Can’t block response based on post-LLM monitoring (use Guardian for pre-LLM blocking)

When to Use Guardian vs Async Monitoring

Use CaseSolutionLatencyWhen to Use
Block PII before LLMGuardian+100msUser input might contain PII
Monitor for complianceAsync Monitoring+0msPost-hoc monitoring and reporting
BothGuardian + Monitoring+100msMaximum protection + evidence

Next Steps

Presets

Learn about automatic detection configuration

Evidence

Transform monitoring into audit-ready evidence

Architecture

Explore the full system architecture

SDK Configuration

Configure presets and compliance settings