Documentation Index
Fetch the complete documentation index at: https://docs.continum.co/llms.txt
Use this file to discover all available pages before exploring further.
The Latency Problem
Traditional compliance solutions add 2-5 seconds to every LLM call:
User Request → Compliance Check (2-5s) → LLM Call (500ms) → Response
Total: 2.5-5.5 seconds 😞
This is unacceptable for production applications where users expect instant responses.
Continum’s Solution
Continum inverts the flow - compliance monitoring runs after the user already has their answer:
User Request → LLM Call (500ms) → Response ⚡
↓
[Async Monitoring] (2-5s, user doesn't wait)
How It Works
1. Direct Execution
The SDK calls your LLM provider directly using your API keys:
import { protect } from '@continum/sdk';
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY // Stays on your server
});
// This calls OpenAI directly, no proxy
const response = await protect(
() => openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: 'Hello' }]
}),
{
apiKey: process.env.CONTINUM_API_KEY!,
preset: 'customer-support'
}
);
// Response in ~500ms, same as direct OpenAI call
Key insight: Continum never sits between you and the LLM provider.
2. Async Monitoring
After returning the response, the SDK sends interaction details for monitoring:
// User already has response here ✅
// SDK does this in background (not awaited):
// Sends interaction details to Continum for compliance monitoring
3. Compliance Processing
Continum monitors your interaction asynchronously:
Receive interaction details
↓
Analyze for compliance violations
↓
Generate compliance signal
↓
Store as evidence
↓
Appears in dashboard
Total time: 2-5 seconds, but your user doesn’t wait for this.
| Approach | User Latency | Compliance Delay | Production Ready |
|---|
| Blocking compliance | 2.5-5.5s | 0s (inline) | ❌ Too slow |
| No compliance | 500ms | ∞ (never) | ❌ Risky |
| Continum | 500ms | 2-5s (async) | ✅ Best of both |
Guardian: Fast Pre-LLM Protection
For cases where you need pre-LLM protection (e.g., blocking PII before it reaches the LLM), Continum offers Guardian:
const continum = new Continum({
continumKey: process.env.CONTINUM_KEY,
openaiKey: process.env.OPENAI_API_KEY,
guardianEnabled: true // Enable pre-LLM PII detection
});
const response = await continum.llm.openai.gpt_4o.chat({
messages: [{ role: 'user', content: 'My email is john@example.com' }],
sandbox: 'your-sandbox-slug'
});
// Guardian detects PII in < 100ms
// Redacts before sending to OpenAI
// Total latency: ~600ms (still acceptable)
Guardian uses fast pattern matching and detection models to identify PII in under 100ms.
Unified Flow
Continum combines Guardian (pre-LLM) and async monitoring (post-LLM) in one seamless flow:
1. Guardian Check (< 100ms)
↓
2. Direct LLM Call (~500ms)
↓
3. Return Response to User ⚡
↓
4. Async Monitoring (2-5s, background)
Total user-facing latency: ~600ms (vs 2.5-5.5s with blocking compliance)
Real-World Example
import { Continum } from '@continum/sdk';
const continum = new Continum({
continumKey: process.env.CONTINUM_KEY,
openaiKey: process.env.OPENAI_API_KEY,
guardianConfig: {
enabled: true,
action: 'REDACT_AND_CONTINUE'
}
});
// User sends message with PII
const userMessage = 'My SSN is 123-45-6789 and email is john@example.com';
const start = Date.now();
const response = await continum.llm.openai.gpt_4o.chat({
messages: [{ role: 'user', content: userMessage }],
sandbox: 'your-sandbox-slug'
});
const latency = Date.now() - start;
console.log(`User latency: ${latency}ms`); // ~600ms
console.log(response.content); // User sees response immediately
// Meanwhile, in the background:
// - Guardian detected SSN and email (< 100ms)
// - Redacted before sending to OpenAI
// - Async monitoring running in background (2-5s)
// - Compliance signal will appear in dashboard shortly
Why This Matters
For Users
- Instant responses (no waiting for compliance)
- Same experience as direct LLM calls
- No degraded performance
For Developers
- Drop-in replacement for existing LLM calls
- No architecture changes required
- Keep your API keys on your server
For Compliance Teams
- 100% coverage of LLM interactions
- Real-time dashboard monitoring
- Audit-ready evidence for regulations
Trade-offs
What You Get
✅ Zero added latency for users
✅ 100% compliance coverage
✅ Real-time monitoring
✅ Privacy-first architecture
What You Accept
⚠️ Compliance results appear 2-5s after response (not inline)
⚠️ Can’t block response based on post-LLM monitoring (use Guardian for pre-LLM blocking)
When to Use Guardian vs Async Monitoring
| Use Case | Solution | Latency | When to Use |
|---|
| Block PII before LLM | Guardian | +100ms | User input might contain PII |
| Monitor for compliance | Async Monitoring | +0ms | Post-hoc monitoring and reporting |
| Both | Guardian + Monitoring | +100ms | Maximum protection + evidence |
Next Steps
Presets
Learn about automatic detection configuration
Evidence
Transform monitoring into audit-ready evidence
Architecture
Explore the full system architecture
SDK Configuration
Configure presets and compliance settings