Stack · Chatbot
Chatbot stack
The complete recipe for a customer-facing chatbot. Quality, latency, and cost at 100K queries per month across three tiers.
Tiers3
TypeStack recipe
Updated2026-04
What this page is
Customer chatbots live or die on latency and cost per query. Users will tolerate a 50 tok/s answer; anything slower feels broken. Frontier quality matters most when the conversation is complex (insurance, legal, healthcare). For general ecommerce or SaaS support, mid-tier is the sweet spot. Our cost estimates assume 100K queries per month at ~1500 input + 500 output tokens each.
Tier-by-tier breakdown
Frontier, mainstream, and budget recipes. Pick the row that matches your workload.
Frontier
Enterprise · max quality
For regulated industries and high-stakes conversations. Azure routing adds enterprise compliance (SOC2 + HIPAA + EU residency). Latency is the main tradeoff · Opus runs ~55 tok/s.
Mainstream
Mainstream · default production
The default. Strong conversational quality, ~90 tok/s, prompt caching on repeated system prompts drops effective input cost to roughly $0.50/M. Covers 90% of chatbot workloads.
Budget
Free tier · startup MVP
Provider
Google AI StudioEstimate · 100K queries
~$170/mo (or free below quota)
Google AI Studio offers a generous free tier on Flash. For MVPs and side projects, you may not pay at all. Quality is surprisingly high for simple chat. 260 tok/s means fastest perceived UX.
Alternative picks
If the defaults do not fit, try these.
Alternative
Fastest Claude. Good for high-volume chat with brand voice needs.
Alternative
Ultra-cheap open-source default. Quality gap vs GPT-5 is small for general support.
Alternative
If speed matters most (voice, real-time agent). Groq hits 1400 tok/s on this model.
Frequently asked questions
Llama 3.3 70B on Cerebras or Groq will stream at 1000+ tok/s. For closed models, Gemini 2.5 Flash is the fastest frontier-class option at ~260 tok/s.