Use case · Reasoning
Cheapest reasoning LLMs
The cheapest models that hold up on GPQA, AIME, MATH, MMLU, HLE. Ranked by price per 1M input tokens.
Models30
Cheapest$0.00
ScopeGPQA · AIME · MATH
What this page is
This page ranks every model with credible reasoning scores (GPQA, AIME, MATH, MMLU, HLE, DROP, BBH) by input price. Reasoning models burn a lot of thinking tokens, so the headline input price is only part of the bill. The cheap end is dominated by open-source reasoners like DeepSeek, Qwen3, and GLM. Premium o-series and Claude Opus sit at the top of the price scale. Pair with our cost calculator to model real workloads.
Ranked by input price
Models with credible reasoning scores, cheapest first.
Top 3 cheapest reasoning LLMs
Best value reasoner
Gemma 3 27B (free)
input
$0.00/M
output
$0.00/M
Gemma 3 27B (free) clears our reasoning filter (GPQA, AIME, MATH) at the lowest input price. Strong choice for bulk analytical work.
Runner up
gpt-oss-120b (free)
input
$0.00/M
output
$0.00/M
gpt-oss-120b (free) matches Gemma 3 27B (free) on reasoning benchmarks while keeping prices low. Good vendor diversification pick.
Third choice
gpt-oss-20b (free)
input
$0.00/M
output
$0.00/M
gpt-oss-20b (free) matches Gemma 3 27B (free) on reasoning benchmarks while keeping prices low. Good vendor diversification pick.
The price gap · cheapest vs most expensive
Cheapest
Gemma 3 27B (free)
$0.00/M
$ per 1M input tokens
Why the gap
Premium reasoners pay for longer thinking budgets, better tool use, and vendor reliability. For many tasks, Gemma 3 27B (free) closes 70 to 90 percent of the GPQA gap at a fraction of the cost.
Most expensive
Gemini 2.0 Flash
$0.10/M
$ per 1M input tokens
Frequently asked questions
Models with explicit reasoning scores on GPQA Diamond, AIME 2024/2025, MATH-500, MMLU-Pro, HLE, DROP, BBH, or ARC-AGI. Reasoning models typically use extended chain-of-thought and burn more tokens on hard problems.