Beta
Performance · Speed tier

Fastest LLM inference

Every major LLM ranked by tokens per second. Specialized inference chips (Groq, Cerebras, SambaNova) dominate the leaderboard.

Models tracked20
Fastest2100 tok/s
ScopeGroq · Cerebras · SambaNova · Major labs
What this page is
This page ranks models by inference speed (tokens per second) across providers. Specialized chips (Groq LPU, Cerebras WSE-3, SambaNova RDU) dominate the top. Major labs (OpenAI, Anthropic, Google) cluster in the mid-tier because their focus is quality, not raw throughput. Use this page when latency is a product feature: real-time chat, voice, and interactive agents.
#Model · Providertok/s
12100
21800
31400
4950
5780
6570
7520
8480
9400
10320
11260
12210
13180
1495
1585
1680
1755
1845
1940
2035
Ultra-fast (1000+ tok/s)
Groq · Cerebras · SambaNova

Specialized inference chips. Supports a curated set of open-source models. Use for real-time voice, live code completion, and latency-sensitive agents.

Fast (100 to 500 tok/s)
Together · Fireworks · DeepInfra

GPU-based but tuned for throughput. Broad model selection. Solid choice for production chat and RAG.

Standard (30 to 100 tok/s)
OpenAI · Anthropic · Google

Frontier labs. Speed is not the design goal · quality and reasoning depth are. Use when you care about the best answer more than the fastest.

Human reading speed is about 5 tok/s. Most chat UIs feel instant above 30 tok/s. Real-time voice and interactive agents benefit from 100+ tok/s. Groq and Cerebras push 1000+ for small and mid-size models.