Architecture & Benchmarks

See how ChatAds works - and how it ranks with alternatives

This page outlines why ChatAds is faster and more reliable than internal POCs or LLMs, while providing benchmarks to compare against real data.

Build vs Buy

How teams try to build AI chat monetization themselves — and where each stack breaks

The quick POC is spaCy text extraction and basic keyword/BM25 matching. Production builds use LLMs and vector retrieval tools. Then there is ChatAds, which does both extraction and resolution.

Input

AI-generated response

Since you've got AirPods, a better workout pick is the Powerbeats Pro. You can usually find them at Best Buy for around $200.

POC stack spaCy + keyword/BM25
50mslatency $0.02/ 1k

Cheap and fast enough for a demo. Breaks on ownership, stores, bare brands, accessories, and model drift.

Likely output: links AirPods, Best Buy, or another noisy surface term.
DIY production stack LLM extractor + vector retrieval
1s - 2slatency $0.15 - $0.75/ 1k

Better semantic coverage, but requires another LLM call. Still needs custom validators for wrong brands, accessories, and bad matches.

Likely chooses 'Powerbeats Pro', but costly and slows down the AI response to the user.
ChatAds Extracted keyword + resolved offer
~100 mslatency $0.02/ 1k

Runs extraction and resolution as one commerce-specific pipeline. Returns a tracked offer, or nothing when the match is bad.

Output: chooses 'Powerbeats Pro' with matching link, fast enough to insert into the AI response.
Time to market

Build vs buy: how fast can this safely ship?

A prototype is quick. A production-safe commerce layer is not. The gap is validators, resolution quality, refusal behavior, tracking, and ongoing evals.

Path Time to market What ships Main risk
POC build 1-2 weeks Prompt, parser, or keyword/vector lookup against one catalog. Looks convincing on curated demos. Breaks on ownership, stores, accessories, comparisons, and ambiguous product mentions.
Production-ready internal build 3-6 months Extraction logic, catalog resolution, validators, revenue ranking, tracking, rate limits, observability, and evals. LLM call slows down inline response, and you're spending countless hours tackling linguistic edge cases while users complain about bad offers.
Robust commercial product 6+ months Dedicated ML pipeline, large edge-case corpus, catalog quality controls, customer controls, billing, dashboards, docs, SDKs, and ongoing eval ops. Internal and customized - but 6+ months of engineering opportunity cost.
Or, ChatAds

Time to market: 1-2 days

Integrate the API and get the production commerce layer without building extraction, resolution, validation, and tracking from scratch.

  • Validated product extraction from generated AI text
  • Catalog resolution with rule-based refusal for irrelevant matches
  • Revenue-aware offer selection and tracked URLs
  • No extra LLM call in the response path
  • API keys, usage tracking, rate limits, and billing controls
Architecture

How ChatAds actually works

End-to-end live request path: two binary monetizable classifiers, intent & entity extraction, catalog resolution with quality filters, rule-based validators, and revenue-optimized selection — all under 100ms, no LLM in the hot path.

AI

Your platform

AI application / chatbot

AI generates a response to the user.

1

Call ChatAds

{
  "response_id": "abc123",
  "conversation_id": "xyz789",
  "response_text": "Here are
some great noise-cancelling
headphones for travel..."
}

API response

< 100ms

Response with eCommerce link inserted, or original text if no fit.
"Here are some great
noise-cancelling headphones
for travel: [Sony WH-1000XM5]
(eCommerce link) ..."
End-to-end latency: < 100ms p50
2
F

Monetizable binary classifiers

Two independent models decide whether to continue. Fast fail when the response is not monetizable.

3
E

Intent & entity extraction

spaCy pipeline with contextual enrichment, intent identification, blocklists, brand matching, and span resolution.

4
DB

Catalog resolution & quality filters

Local CPU database search, LRU cache, semantic similarity matching, then filters for stars, reviews, in-stock, and price.

5

Rule-based product result validators

Title similarity, accessory catches, vertical mismatch, brand mismatch, demographic mismatch, and brand-vs-generic comparison.

6
$

Revenue optimization

Expected value per click using commission rate, conversion rate, price, brand strength, CTR, stock, ratings, and review volume.

7

Select best keyword & resolve URL

Return the highest expected-value result with the best anchor text and resolved eCommerce URL, or correctly refuse.

Our approach

Why an LLM is the wrong tool for monetizing AI conversations

Calling another LLM to extract products from AI text is the obvious first instinct — and the wrong one. Here's how a deterministic ML pipeline compares to an LLM extraction call across the dimensions that matter for production commerce.

Dimension ChatAds (ML pipeline) LLM extraction
Latency <100ms total. Stable p99. 800ms-2s typical. p99 spikes to 5s+ during peak load on shared APIs. Variance kills inline use.
Cost* Fractions of a cent per call. Predictable. Best models are expensive, old ones hallucinate, and prices are rising.
Accuracy Pulls directly from text. Catalog-grounded. Extensive linguistic validation. LLMs hallucinate, and semantic search struggles with intent.
Determinism Same input → same output. Testable, A/B-able, debuggable. Outputs drift run-to-run, and LLM updates can break workflows.
Uptime* Your infrastructure with self-hosted ChatAds. OpenAI and Anthropic can have outages and latency issues.
Data privacy* No LLM-vendor data sharing. AI conversations don't leave your stack. Every call ships your users' AI conversations to a third-party model vendor.

* Uptime, costs, and data-privacy advantages assume self-hosted or VPC deployment of ChatAds. On the hosted ChatAds API, those concerns would still apply. Self-host removes that boundary entirely.

9 cases

Extraction benchmarks — who extracts well and fast enough to run inline?

Modern LLMs extract well — that's not the question anymore. The question is whether you can get that quality without a second model call in your response path. spaCy is fast (~13ms) but returns junk chunks. A current LLM (gpt-5.4-nano) usually picks the right product — but takes ~0.6–1.3s and a separate API call to do it. ChatAds matches the LLM's pick in ~20ms, inline, with no extra call. Pick a case to see all three side-by-side.

Messages without products

Pure advice with nothing to sell — and the LLM still takes ~0.8s to say so

AI reply

Strength training comes down to consistency more than equipment. Three sessions a week with progressive overload will outperform an expensive home gym used twice a month.

Method Extracted products Pick / offer Latency
spaCy noun-chunks Strength trainingconsistencyequipmentThree sessionsa weekprogressive overloadan expensive home gymtwice a month Just extracts phrases — doesn't pick a winner 11.8ms
gpt-5.4-nano none none (correct) 837.2ms
ChatAds none none (correct) 18.4ms
Takeaway: A modern LLM correctly returns nothing here, but spends ~0.8s and a full model call to do it. ChatAds reaches the same "no offer" in ~18ms with no extra call. — Correct, but ~45× slower to say no.
Hallucinated products

Top models stop hallucinating here — but cheaper tiers don't, and it still costs ~1.3s

AI reply

For someone just getting into espresso without spending too much, the standard recommendation has held up for years — small footprint, easy to use, surprisingly capable for the price.

Method Extracted products Pick / offer Latency
spaCy noun-chunks someoneespressothe standard recommendationyearssmall footprintthe price Just extracts phrases — doesn't pick a winner 13.2ms
gpt-5.4-nano none none (correct) 1300.2ms
ChatAds none none (correct) 11.0ms
Takeaway: Today's top models decline correctly, but take ~1.3s to get there — and cheaper or older LLM tiers (4.1-nano, mini) still invent a specific espresso machine the reply never named. — Slow, and fragile on budget models.
Multiple products → one pick

Three options, one highlighted — the LLM gets the pick, ~1.1s later

AI reply

You've got three solid blender options at this price: the Ninja Foodi is durable, the NutriBullet Pro is compact, and the Vitamix E310 is the long-haul investment — that's the one I'd actually pick if you can stretch the budget.

Method Extracted products Pick / offer Latency
spaCy noun-chunks three solid blender optionsthis pricethe Ninja Foodithe NutriBullet Prothe Vitamix E310the long-haul investmentthe onethe budget Just extracts phrases — doesn't pick a winner 18.4ms
gpt-5.4-nano Ninja FoodiNutriBullet ProVitamix E310 Vitamix E310 1117.6ms
ChatAds Vitamix E310Ninja FoodiNutriBullet Pro Vitamix E310 21.7ms
Takeaway: A modern LLM ranks the intent and picks the Vitamix correctly — but at ~1.1s and a second model call in your response path. ChatAds returns the same pick in ~22ms. — Right pick, wrong latency budget.
Owned / in-use suppression

The LLM skips the owned charger and picks the right one — just not inline-fast

AI reply

Since you're already running an Anker MagSafe charger, the Apple 70W USB-C Power Adapter is the wall charger I'd pair with it — fast enough for your phone and a MacBook without buying anything else.

Method Extracted products Pick / offer Latency
spaCy noun-chunks an Anker MagSafe chargerthe Apple 70W USB-C Power Adapteryour phonea MacBookanything Just extracts phrases — doesn't pick a winner 9.7ms
gpt-5.4-nano Anker MagSafe chargerApple 70W USB-C Power Adapter Apple 70W USB-C Power Adapter 899.5ms
ChatAds Apple 70W USB-C Power Adapter Apple 70W USB-C Power Adapter 18.9ms
Takeaway: A modern LLM suppresses the owned Anker charger and picks the Apple adapter correctly — ~0.9s and a second API call slower than doing it inline. ChatAds returns the same pick in ~19ms. — Correct, but not inline-fast.
Bare brand mentions

Brands appear in non-shopping contexts — ecosystem comparisons, news, opinion. Naive extractors monetize the brand name with no actual product attached.

AI reply

Apple's tight ecosystem is great if you're already on Mac and iPhone, but it locks you in. Sony and Bose offer better cross-platform pairing.

Method Extracted products Pick / offer Latency
spaCy noun-chunks Apple's tight ecosystemMaciPhoneSonyBosebetter cross-platform pairing Just extracts phrases — doesn't pick a winner 12.4ms
gpt-5.4-nano AppleSonyBose Sony
Bare brand monetized
641.4ms
ChatAds none none (correct) 17.9ms
Takeaway: Returns Apple, Sony, and Bose as products. There's no actual recommendation here — just a comparison of ecosystems. — Brand-as-topic monetized.
Brand & generic in same span

Branded product described generically — the LLM returns it cleanly, ~0.8s later

AI reply

The Anker PowerCore 10000 is the standard answer here — a compact 10,000mAh power bank that fits in a pocket and charges most phones twice over.

Method Extracted products Pick / offer Latency
spaCy noun-chunks The Anker PowerCorethe standard answera compact 10,000mAh power banka pocketmost phones Just extracts phrases — doesn't pick a winner 14.1ms
gpt-5.4-nano Anker PowerCore 10000 Anker PowerCore 10000 753.6ms
ChatAds Anker PowerCore 10000 Anker PowerCore 10000 20.3ms
Takeaway: A modern LLM collapses the variants and returns the single branded product correctly — at ~0.8s and a second model call, versus ChatAds inline at ~20ms. — Correct, but slow.
Comparison direction

"Upgrading from X to Y" — the LLM links Y correctly, ~0.7s later

AI reply

If you're upgrading from your old MacBook Air to a more powerful machine for video editing, the Lenovo ThinkPad P14s with the Ryzen 7 chip is a strong pick.

Method Extracted products Pick / offer Latency
spaCy noun-chunks your old MacBook Aira more powerful machinevideo editingthe Lenovo ThinkPad P14sthe Ryzen 7 chipa strong pick Just extracts phrases — doesn't pick a winner 10.4ms
gpt-5.4-nano Lenovo ThinkPad P14s with the Ryzen 7 chip Lenovo ThinkPad P14s with the Ryzen 7 chip 714.0ms
ChatAds Lenovo ThinkPad P14s Lenovo ThinkPad P14s 22.1ms
Takeaway: A modern LLM follows the upgrade direction and links the Lenovo, not the MacBook Air being replaced — correct, but ~0.7s and a second model call. ChatAds does it inline in ~22ms. — Correct, but not inline-fast.
Not in catalog

AI replies often name real products that aren't in your affiliate catalog. Naive extractors return the name and dump the resolution failure on the caller — a downstream search returns no result, or worse, drifts to a no-name fallback. ChatAds checks the catalog inline and returns no offer when no high-confidence match exists.

AI reply

If you're getting into mechanical keyboards, the Topre Realforce R3 is the gold standard — heavy electrostatic-capacitive switches and a tactile feel you can't get from MX-style boards.

Method Extracted products Pick / offer Latency
spaCy noun-chunks mechanical keyboardsthe Topre Realforce R3the gold standardheavy electrostatic-capacitive switchesa tactile feelMX-style boards Just extracts phrases — doesn't pick a winner 12.3ms
gpt-5.4-nano Topre Realforce R3 Topre Realforce R3
No catalog check — caller gets a name, not a SKU
618.6ms
ChatAds Topre Realforce R3 none (correct) 19.8ms
Takeaway: Extracts the brand+model correctly but leaves the caller to discover the SKU isn't in catalog. Downstream search returns nothing — or drifts to a no-name keyboard. — Resolution problem dumped on caller.
Generic-adjective bloat

Marketing adjectives ("high-quality", "premium", "professional-grade") aren't part of a product identity — they pad the phrase but match nothing in a real catalog. Naive extractors keep them, ChatAds strips them.

AI reply

For everyday cooking, a high-quality nonstick skillet handles most stovetop tasks — eggs, pancakes, sautéed veggies, and quick pan sauces.

Method Extracted products Pick / offer Latency
spaCy noun-chunks everyday cookinga high-quality nonstick skilletmost stovetop taskseggspancakessautéed veggiesquick pan sauces Just extracts phrases — doesn't pick a winner 12.0ms
gpt-5.4-nano high-quality nonstick skillet high-quality nonstick skillet
Marketing adjective retained
729.0ms
ChatAds nonstick skillet nonstick skillet 18.7ms
Takeaway: Returns "high-quality nonstick skillet" — the marketing adjective inflates the phrase but is meaningless to a real catalog. — Adjective bloat retained.
7 resolution cases

Resolution benchmarks — who resolves the best offer?

Pick a failure mode. See all three methods. Even when extraction is correct, the wrong resolver produces unsafe links. ChatAds rows are real API output; keyword/BM25 and plain-vector rows are illustrative of the dominant failure mode for each approach.

Demographic drift

Extracted phrase: digital watch

Source AI reply

If you just want something reliable for everyday wear, go with a basic digital watch — they're affordable, have great battery life, and the backlight makes them easy to read at night.

Method Returned product Verdict
Keyword / BM25 Kids Cartoon Digital Watch with Light-Up Face Wrong demographic
BM25 ranks by token overlap × review count. Kids watches dominate review counts in this category.
Plain vector top-1 Kids Cartoon Digital Watch with Light-Up Face Wrong demographic
Same review-count bias surfaces in the embedding manifold — high-review SKUs cluster nearby and outrank adult alternatives.
ChatAds digital watch Adult digital watch (kids SKU rejected)
Why this matters: Generic adult-watch queries land on kids' watches in most consumer catalogs because kids' SKUs accumulate higher review counts. ChatAds runs a demographic-mismatch validator that rejects kids/men's/women's matches when no demographic was specified.
Accessory not the device

Extracted phrase: Lenovo Yoga Slim 7

Source AI reply

If you're shopping for a new ultrabook for college, the Lenovo Yoga Slim 7 is hard to beat for the price — long battery life and a solid screen.

Method Returned product Verdict
Keyword / BM25 Yoga Slim 7 Sleeve Protective Case Wrong product type
All four query tokens appear in the title. Review count breaks the tie toward the case.
Plain vector top-1 Yoga Slim 7 Sleeve Protective Case Wrong product type
Sleeve and laptop sit close in the embedding manifold; review-count bias pushes the sleeve to top-1.
ChatAds no offer No offer
Accessory validator rejects the sleeve. No device SKU available, so no offer rather than a wrong link.
Why this matters: Cases, sleeves, replacement keyboards, and chargers outnumber the actual device SKU in most catalogs. Both lexical and semantic retrieval drift to whichever accessory has the most reviews. ChatAds validates that the resolved product is the device itself, not an accessory.
Brand drift

Extracted phrase: Dyson V8

Source AI reply

For a reliable cordless vacuum on a tight budget, the Dyson V8 holds up well even years in and the battery is plenty for most apartments.

Method Returned product Verdict
Keyword / BM25 INSE Cordless Stick Vacuum 6-in-1 Wrong brand
Token "vacuum" matches; "Dyson" outranked by review count. BM25 has no concept of brand identity.
Plain vector top-1 INSE Cordless Stick Vacuum 6-in-1 Wrong brand
Embedding similarity collapses brand signal. High-review no-name vacuum outranks the Dyson SKU.
ChatAds Dyson V8 Animal Cordless Vacuum Brand held
Why this matters: Plain retrieval ignores brand identity. BM25 returns whatever matches "Dyson" or "vacuum" by review count — often a different generation. Vector drifts further, surfacing high-review no-name vacuums that cluster near the Dyson SKU. ChatAds enforces brand fidelity: if the search term carries a brand, the resolved product must too — or it falls back to a sibling within the brand line.
Generic category collapse

Extracted phrase: cast iron skillet

Source AI reply

For most home cooks, a good cast iron skillet is the single most versatile pan you can own — it goes from stovetop to oven without missing a beat.

Method Returned product Verdict
Keyword / BM25 12-Piece Nonstick Cookware Pots and Pans Set Bundle, not a single skillet
Token "skillet" appears in the bundle title. Review count promotes the multi-piece set over single SKUs.
Plain vector top-1 Carbon Steel Wok with Flat Bottom Wrong pan type
Embedding clusters all "pan" SKUs together. High-review woks and frying-pan sets often outrank a single cast iron skillet.
ChatAds cast iron skillet Single quality default
Why this matters: Unbranded category extractions are common ("a good cast iron skillet", "a basic tripod"). Naive retrieval picks the highest-ranked listing — often a multi-piece cookware set or a different pan type, both of which match "skillet" by token. ChatAds runs a generic-prefix-mismatch validator that rejects titles where the query is a prefix of a longer phrase that names a different product.
Model number identity

Extracted phrase: Sony A7 IV

Source AI reply

For wildlife photography I'd recommend the Sony A7 IV paired with a 200-600mm telephoto — the autofocus tracking is exceptional and the burst rate handles fast-moving subjects.

Method Returned product Verdict
Keyword / BM25 Sony Alpha a6400 Mirrorless Camera Wrong model
Tokens "Sony" + "IV" (Roman numeral) are weak; review count surfaces the more popular a6400.
Plain vector top-1 Sony Alpha a7C Full-Frame Camera Wrong generation
Embedding collapses A7 variants. Closest cluster member by similarity isn't the IV.
ChatAds Sony a7 IV Mirrorless Camera Exact model
Why this matters: Model numbers (A7 IV, RT-AX86U, S24 Ultra) carry product identity. Lexical search tokenizes them as noise ("A7", "IV") and ranks by review count, often surfacing a different generation. Vector search treats alphanumeric tokens as low-signal and collapses across model variants. ChatAds preserves model-number tokens through embedding and matches them to the exact catalog SKU.
Context vertical mismatch

Extracted phrase: nursery night light

Source AI reply

For a newborn's nursery, a nursery night light with a warm amber glow is gentle enough not to disrupt sleep.

Method Returned product Verdict
Keyword / BM25 VEKKIA Industrial LED Shop Light with Amber Mode Wrong vertical
Tokens "night" + "light" match. Review count promotes the industrial fixture far above niche nursery lights.
Plain vector top-1 BLACK+DECKER Workshop LED Floodlight Wrong vertical
Embedding clusters all light SKUs together. Higher-reviewed industrial fixtures outrank baby-vertical alternatives.
ChatAds nursery night light Baby-context night light
Why this matters: ChatAds emits per-keyword vertical tags from the surrounding context (baby, pet, automotive, gardening, professional) using a ±15-token window around the extracted phrase. When a candidate carries a conflicting vertical tag, the resolution gate hard-rejects it. BM25 and plain vector retrieval have no concept of context vertical — they pick whatever matches the tokens or the embedding.
Line fidelity within a brand

Extracted phrase: MacBook Air

Source AI reply

For college, the MacBook Air is plenty — battery life is great and it handles writing, browsing, and Zoom without a fan kicking on.

Method Returned product Verdict
Keyword / BM25 MacBook Pro 14-inch with M3 Chip Wrong line
Token "MacBook" matches both Air and Pro. Review count promotes Pro variants over Air.
Plain vector top-1 MacBook Pro 14-inch with M3 Chip Wrong line
Embedding similarity treats Air and Pro as the same MacBook cluster. Higher-reviewed Pro outranks Air.
ChatAds MacBook Air M4 Air line preserved
Why this matters: Within a brand line, the differentiating token (Air vs Pro, Mini vs Max, SE vs Ultra) carries product identity. Plain retrieval ignores it: vector clustering treats Air and Pro as semantic neighbors, and BM25 with review-count secondary ranking surfaces the more popular Pro variant. ChatAds runs a line-fidelity gate (CHA-5486) that blocks candidates lacking the differentiating token.
Live demo

Test ChatAds using a demo fitness assistant.

Our AI assistant is fine-tuned on fitness responses and uses the Amazon catalog for product resolution.

Bring commerce to AI-generated text

Use ChatAds to detect product recommendations, resolve safe offers, and return tracked links before the response renders.