Beta
Stack · Chatbot

Chatbot stack

The complete recipe for a customer-facing chatbot. Quality, latency, and cost at 100K queries per month across three tiers.

Tiers3
TypeStack recipe
Updated2026-04
What this page is
Customer chatbots live or die on latency and cost per query. Users will tolerate a 50 tok/s answer; anything slower feels broken. Frontier quality matters most when the conversation is complex (insurance, legal, healthcare). For general ecommerce or SaaS support, mid-tier is the sweet spot. Our cost estimates assume 100K queries per month at ~1500 input + 500 output tokens each.

Frontier, mainstream, and budget recipes. Pick the row that matches your workload.

Frontier
Enterprise · max quality
Model
Claude Opus 4
in $15/M · out $75/M
Estimate · 100K queries
~$5,000/mo
For regulated industries and high-stakes conversations. Azure routing adds enterprise compliance (SOC2 + HIPAA + EU residency). Latency is the main tradeoff · Opus runs ~55 tok/s.
Mainstream
Mainstream · default production
Model
GPT-5 Chat
in $2.50/M · out $10/M
Estimate · 100K queries
~$875/mo
The default. Strong conversational quality, ~90 tok/s, prompt caching on repeated system prompts drops effective input cost to roughly $0.50/M. Covers 90% of chatbot workloads.
Budget
Free tier · startup MVP
Model
Gemini 2.5 Flash
in $0.30/M · out $2.50/M
Estimate · 100K queries
~$170/mo (or free below quota)
Google AI Studio offers a generous free tier on Flash. For MVPs and side projects, you may not pay at all. Quality is surprisingly high for simple chat. 260 tok/s means fastest perceived UX.

If the defaults do not fit, try these.

Alternative

Fastest Claude. Good for high-volume chat with brand voice needs.

Alternative

Ultra-cheap open-source default. Quality gap vs GPT-5 is small for general support.

Alternative

If speed matters most (voice, real-time agent). Groq hits 1400 tok/s on this model.

Llama 3.3 70B on Cerebras or Groq will stream at 1000+ tok/s. For closed models, Gemini 2.5 Flash is the fastest frontier-class option at ~260 tok/s.