Zero-Latency Monitoring

The Latency Problem

Traditional compliance solutions add 2-5 seconds to every LLM call:

User Request → Compliance Check (2-5s) → LLM Call (500ms) → Response
Total: 2.5-5.5 seconds 😞

This is unacceptable for production applications where users expect instant responses.

Continum’s Solution

Continum inverts the flow - compliance monitoring runs after the user already has their answer:

User Request → LLM Call (500ms) → Response ⚡
                    ↓
            [Async Monitoring] (2-5s, user doesn't wait)

How It Works

1. Direct Execution

The SDK calls your LLM provider directly using your API keys:

import { protect } from '@continum/sdk';
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY  // Stays on your server
});

// This calls OpenAI directly, no proxy
const response = await protect(
  () => openai.chat.completions.create({
    model: 'gpt-4',
    messages: [{ role: 'user', content: 'Hello' }]
  }),
  {
    apiKey: process.env.CONTINUM_API_KEY!,
    preset: 'customer-support'
  }
);
// Response in ~500ms, same as direct OpenAI call

Key insight: Continum never sits between you and the LLM provider.

2. Async Monitoring

After returning the response, the SDK sends interaction details for monitoring:

// User already has response here ✅

// SDK does this in background (not awaited):
// Sends interaction details to Continum for compliance monitoring

3. Compliance Processing

Continum monitors your interaction asynchronously:

Receive interaction details
  ↓
Analyze for compliance violations
  ↓
Generate compliance signal
  ↓
Store as evidence
  ↓
Appears in dashboard

Total time: 2-5 seconds, but your user doesn’t wait for this.

Performance Comparison

Approach	User Latency	Compliance Delay	Production Ready
Blocking compliance	2.5-5.5s	0s (inline)	❌ Too slow
No compliance	500ms	∞ (never)	❌ Risky
Continum	500ms	2-5s (async)	✅ Best of both

Guardian: Fast Pre-LLM Protection

For cases where you need pre-LLM protection (e.g., blocking PII before it reaches the LLM), Continum offers Guardian:

const continum = new Continum({
  continumKey: process.env.CONTINUM_KEY,
  openaiKey: process.env.OPENAI_API_KEY,
  guardianEnabled: true  // Enable pre-LLM PII detection
});

const response = await continum.llm.openai.gpt_4o.chat({
  messages: [{ role: 'user', content: 'My email is john@example.com' }],
  sandbox: 'your-sandbox-slug'
});
// Guardian detects PII in < 100ms
// Redacts before sending to OpenAI
// Total latency: ~600ms (still acceptable)

Guardian uses fast pattern matching and detection models to identify PII in under 100ms.

Unified Flow

Continum combines Guardian (pre-LLM) and async monitoring (post-LLM) in one seamless flow:

1. Guardian Check (< 100ms)
   ↓
2. Direct LLM Call (~500ms)
   ↓
3. Return Response to User ⚡
   ↓
4. Async Monitoring (2-5s, background)

Total user-facing latency: ~600ms (vs 2.5-5.5s with blocking compliance)

Real-World Example

import { Continum } from '@continum/sdk';

const continum = new Continum({
  continumKey: process.env.CONTINUM_KEY,
  openaiKey: process.env.OPENAI_API_KEY,
  guardianConfig: {
    enabled: true,
    action: 'REDACT_AND_CONTINUE'
  }
});

// User sends message with PII
const userMessage = 'My SSN is 123-45-6789 and email is john@example.com';

const start = Date.now();

const response = await continum.llm.openai.gpt_4o.chat({
  messages: [{ role: 'user', content: userMessage }],
  sandbox: 'your-sandbox-slug'
});

const latency = Date.now() - start;
console.log(`User latency: ${latency}ms`); // ~600ms

console.log(response.content); // User sees response immediately

// Meanwhile, in the background:
// - Guardian detected SSN and email (< 100ms)
// - Redacted before sending to OpenAI
// - Async monitoring running in background (2-5s)
// - Compliance signal will appear in dashboard shortly

Why This Matters

For Users

Instant responses (no waiting for compliance)
Same experience as direct LLM calls
No degraded performance

For Developers

Drop-in replacement for existing LLM calls
No architecture changes required
Keep your API keys on your server

For Compliance Teams

100% coverage of LLM interactions
Real-time dashboard monitoring
Audit-ready evidence for regulations

Trade-offs

What You Get

✅ Zero added latency for users
✅ 100% compliance coverage
✅ Real-time monitoring
✅ Privacy-first architecture

What You Accept

⚠️ Compliance results appear 2-5s after response (not inline)
⚠️ Can’t block response based on post-LLM monitoring (use Guardian for pre-LLM blocking)

When to Use Guardian vs Async Monitoring

Use Case	Solution	Latency	When to Use
Block PII before LLM	Guardian	+100ms	User input might contain PII
Monitor for compliance	Async Monitoring	+0ms	Post-hoc monitoring and reporting
Both	Guardian + Monitoring	+100ms	Maximum protection + evidence

Next Steps

Presets

Learn about automatic detection configuration

Evidence

Transform monitoring into audit-ready evidence

Architecture

Explore the full system architecture

SDK Configuration

Configure presets and compliance settings

Getting Started

SDK

CLI

Core Concepts

Compliance

Dashboard

Zero-Latency Monitoring

The Latency Problem

Continum’s Solution

How It Works

1. Direct Execution

2. Async Monitoring

3. Compliance Processing

Performance Comparison

Guardian: Fast Pre-LLM Protection

Unified Flow

Real-World Example

Why This Matters

For Users

For Developers

For Compliance Teams

Trade-offs

What You Get

What You Accept

When to Use Guardian vs Async Monitoring

Next Steps

Presets

Evidence

Architecture

SDK Configuration

Getting Started

SDK

CLI

Core Concepts

Compliance

Dashboard

Documentation Index

​The Latency Problem

​Continum’s Solution

​How It Works

​1. Direct Execution

​2. Async Monitoring

​3. Compliance Processing

​Performance Comparison

​Guardian: Fast Pre-LLM Protection

​Unified Flow

​Real-World Example

​Why This Matters

​For Users

​For Developers

​For Compliance Teams

​Trade-offs

​What You Get

​What You Accept

​When to Use Guardian vs Async Monitoring

​Next Steps

Presets

Evidence

Architecture

SDK Configuration

The Latency Problem

Continum’s Solution

How It Works

1. Direct Execution

2. Async Monitoring

3. Compliance Processing

Performance Comparison

Guardian: Fast Pre-LLM Protection

Unified Flow

Real-World Example

Why This Matters

For Users

For Developers

For Compliance Teams

Trade-offs

What You Get

What You Accept

When to Use Guardian vs Async Monitoring

Next Steps