Compare · ModelsLive · 2 picked · head to head
Gemini 2.5 Flash vs Qwen2.5-Max
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Gemini 2.5 Flash wins on 5/6 benchmarks
Gemini 2.5 Flash wins 5 of 6 shared benchmarks. Leads in coding · arena · math.
Category leads
coding·Gemini 2.5 Flasharena·Gemini 2.5 Flashknowledge·Qwen2.5-Maxmath·Gemini 2.5 Flash
Hype vs Reality
Attention vs performance
Gemini 2.5 Flash
#144 by perf·#14 by attention
Qwen2.5-Max
#141 by perf·no signal
Vendor risk
Who is behind the model
Google DeepMind
$4.00T·Tier 1
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
6 benchmarks · 2 models
Gemini 2.5 FlashQwen2.5-Max
Aider polyglot
Gemini 2.5 Flash leads by +25.3
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Gemini 2.5 Flash
47.1
Qwen2.5-Max
21.8
Chatbot Arena Elo · Overall
Gemini 2.5 Flash leads by +36.9
Gemini 2.5 Flash
1411.0
Qwen2.5-Max
1374.2
Fiction.LiveBench
Qwen2.5-Max leads by +19.5
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Gemini 2.5 Flash
47.2
Qwen2.5-Max
66.7
FrontierMath-2025-02-28-Private
Gemini 2.5 Flash leads by +3.8
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.
Gemini 2.5 Flash
4.8
Qwen2.5-Max
1.0
Lech Mazur Writing
Gemini 2.5 Flash leads by +3.6
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
Gemini 2.5 Flash
76.5
Qwen2.5-Max
72.9
OTIS Mock AIME 2024-2025
Gemini 2.5 Flash leads by +57.0
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Gemini 2.5 Flash
73.0
Qwen2.5-Max
16.0
Full benchmark table
| Benchmark | Gemini 2.5 Flash | Qwen2.5-Max |
|---|---|---|
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 47.1 | 21.8 |
Chatbot Arena Elo · Overall | 1411.0 | 1374.2 |
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination. | 47.2 | 66.7 |
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning. | 4.8 | 1.0 |
Lech Mazur Writing Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication. | 76.5 | 72.9 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 73.0 | 16.0 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $0.30 | $2.50 | 1.0M tokens (~524 books) | $8.50 | |
| — | — | — | — |
People also compared