Contextual Ad Engine
Upload broadcast footage, let TwelveLabs’ multimodal models identify ideal ad breaks, rank ads by viewer affinity and scene fit, and preview ad-injected video — all from a single dashboard.
Demo Video
The Contextual Ad Gap
Streaming platforms insert ads based on demographic targeting alone — a beer ad plays during a somber funeral scene; a luxury car ad interrupts a comedic moment. The mismatch erodes viewer trust and brand equity.
TwelveLabs’ multimodal models unlock a new primitive: scene-level understanding. By knowing exactly what is happening in every second of video — the emotion, the environment, the objects, the dialogue — we can match ads to moments that amplify rather than interrupt the viewer experience.
40+
Ads in inventory
3
User Profiles
512
Vector Embedding Dimensions
3
Safety modes

Core Features
Everything needed to go from raw footage to precision-targeted ad placement.
Real-time profile ad switching
Switch between the three custom demo personas in the player. Ranked ad recommendations and the injected preview update immediately — no reload — so you can compare how scene fit and user affinity reshape the same break in real time.
Databricks contextual lift
Export Marengo-backed ad metadata and impressions-ready rows to Delta Lake via the Databricks SQL driver. Join TwelveLabs scene IDs with warehouse analytics to measure contextual lift, benchmark campaigns, and operationalize the pipeline beside your existing BI stack.
Semantic Search
Search video inventory by meaning, not keywords. Describe a scene, emotion, or moment — Marengo embeddings surface the exact timestamp across all indexed content.
Multiplicative Ad Scoring
Two-stage formula: scene fit (Marengo vector similarity + environment/tone) multiplied by user affinity. Scene context gates viewer preference — not the reverse.
Cross-Break Diversity
Ensures no single ad or category dominates a broadcast. Selected ads are hard-excluded from future breaks; category caps prevent over-representation.
Ad-Injected Preview
Generate a full broadcast preview with ads injected at computed break points. Ad skip logic, playback controls, and downloadable JSON plan included.

How It Works
Three core steps from raw footage to ranked, diverse ad placement.
Scene Analysis with Pegasus
Every video is analyzed by TwelveLabs' Pegasus model to produce structured, time-stamped scene metadata — sentiment, tone, environment, suitable ad categories, and GARM brand-safety flags.
const res = await tl_client.generate.text(videoId, {
prompt: `Analyze each scene segment and return JSON with:
- scene_context, sentiment, tone, environment
- ad_suitability: { suitable_categories, unsuitable_categories }
- brand_safety: { garm_flags, risk_level }
- ad_break_fitness: { interruption_risk, break_quality, score }
Return a complete JSON array over the full timeline.`
});
// Cached to Vercel Blob for instant re-use
await put(`analysis_v3_${videoId}.json`, JSON.stringify(parsed));Marengo Vector Embeddings
TwelveLabs' Marengo model generates 512-dimensional embeddings for both video segments and ad creatives. Cosine similarity between these vectors provides scene-to-ad semantic matching that keyword matching cannot achieve.
// Cosine similarity between ad creative and scene segment vectors
function cosineSimilarity(a: number[], b: number[]): number {
let dot = 0, nA = 0, nB = 0;
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i]; nA += a[i] ** 2; nB += b[i] ** 2;
}
return dot / (Math.sqrt(nA) * Math.sqrt(nB));
}
// Stretch tight cosine range [0.35–0.75] → full [0–1] scale
const rawSim = cosineSimilarity(ad.vector, segment.vector);
const SIM_MIN = 0.35, SIM_MAX = 0.75;
let contextMatch = (rawSim - SIM_MIN) / (SIM_MAX - SIM_MIN);
contextMatch = Math.pow(Math.max(0, Math.min(1, contextMatch)), 1.5);Two-Stage Multiplicative Scoring
Ads are ranked using a two-stage formula: scene fit multiplied by user affinity. Scene fit acts as a gate — even a viewer's favourite ad category scores near zero in the wrong scene.
// Stage 1 — sceneFit: does this ad BELONG in this scene?
const sceneFit =
suitableMatch * 0.15 + // Pegasus suitable_categories hit
environmentFit * 0.15 + // Environment × category affinity table
toneCompat * 0.10 + // Emotional tone compatibility
contextMatch * 0.60; // Marengo vector similarity (dominant)
// Stage 2 — adAffinity: pre-computed user → ad category score
const totalScore = adAffinity * sceneFit;
// Diversity: no ad wins twice; category cap = ceil(breaks / 2)
return selectAdsWithDiversity(ranked, plan, config);Databricks Delta Lake Export
Ad metadata and clip-averaged Marengo vectors are exported to a Databricks Delta table via a single POST request. The marengo_embedding_json column holds a JSON float array ready for Mosaic AI Vector Search indexing.
-- Generated by buildAdMetadataExportSql()
-- Target: main.default.ad_metadata_premium_spirits
-- marengo_embedding_json: JSON float[] — cast with
-- from_json(marengo_embedding_json, 'array<double>') for Vector Search
CREATE OR REPLACE TABLE `main`.`default`.`ad_metadata_premium_spirits` AS
SELECT * FROM VALUES
('grey_goose_30s.mp4', 'Grey Goose — The Art of Enjoyment', 30,
'["Bar scene","Cocktail","Celebration"]',
'["Adults","HHI $100K+"]', '["Underage"]',
'{"highPriority":["Luxury","Premium Spirits"]}',
'["Children programming","Violence"]', '[]',
'[0.021,-0.034,0.119,...]', 512, 'twelvelabs_marengo',
'embedded_marengo_clip_avg')
AS v(creative_id, campaign_name, duration_sec, extracted_visual_contexts,
target_demographics, negative_demographics, target_audience_affinity,
negative_campaign_contexts, brand_safety_garm, marengo_embedding_json,
embedding_dim, embedding_model, vector_sync_status);
Why TwelveLabs?
TwelveLabs provides the foundational models that power every inference in this engine.
Marengo 3.0 — Multimodal Embeddings
State-of-the-art video representation model that encodes visual, audio, and textual content into a unified 512-dimensional vector space. Powers semantic scene matching in this engine.
Pegasus 1.2 — Generative Video Understanding
Generates structured scene metadata (sentiment, tone, environment, GARM flags, suitable ad categories) with frame-level accuracy — the foundation of every ad break decision.
Enterprise-Grade Infrastructure
SOC 2 compliant, built for scale. Process thousands of hours of video via a simple REST API with consistent, predictable pricing and 99.9% uptime.
Research-Backed Innovation
TwelveLabs' research team continuously improves model accuracy across visual, audio, and textual modalities — improving ad relevance with every model release.

Business Impact
Contextual placement drives measurable outcomes for publishers, brands, and viewers.
- Higher CPMs — Context-matched inventory commands premium rates over run-of-network.
- Brand Safety at Scale — Automated GARM classification prevents costly misplacements.
- Viewer Retention — Relevant ads reduce skip rates and improve completion metrics.
- Scene-Fit Verification — Know exactly which scenes your creative appears in before buying.
- Cross-Break Diversity — Frequency controls prevent over-exposure and ad fatigue.
- Affinity-Matched Audiences — Viewer cohort scoring ensures the right person sees the right ad.

Technology Stack
| Layer | Technology | Purpose |
|---|---|---|
| Frontend | Next.js 15 + React 19 | App Router, SSR, and real-time client computation |
| Video AI | TwelveLabs API | Marengo embeddings + Pegasus scene analysis |
| Ad Engine | TypeScript Pure Functions | Deterministic scoring with useMemo for instant re-ranking |
| Styling | Tailwind CSS v4 + Strand DS | TwelveLabs brand design system |
| Storage | Vercel Blob | Video analysis cache + embedding cache |
| Streaming | HLS.js + CloudFront CDN | Adaptive bitrate playback for content + ad videos |
| Data Platform | Databricks Delta Lake | Ad metadata + Marengo embedding export for Mosaic AI Vector Search |
Ready to see contextual ad placement in action?
Upload a video, explore the ad inventory, and watch the engine rank ads in real time as you switch viewer profiles.
