Compare · ModelsLive · 2 picked · head to head

Gemini 2.5 Flash vs Qwen2.5-Max

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Gemini 2.5 Flash wins 5 of 6 shared benchmarks. Leads in coding · arena · math.

Category leads
coding·Gemini 2.5 Flasharena·Gemini 2.5 Flashknowledge·Qwen2.5-Maxmath·Gemini 2.5 Flash
Hype vs Reality
Gemini 2.5 Flash
#144 by perf·#14 by attention
OVERHYPED
Qwen2.5-Max
#141 by perf·no signal
QUIET
Best value
Gemini 2.5 Flash
28.6 pts/$
$1.40/M
Qwen2.5-Max
no price
Vendor risk
Google DeepMind logo
Google DeepMind
$4.00T·Tier 1
Low risk
Alibaba Qwen logo
Alibaba (Qwen)
$293.0B·Tier 1
Low risk
Head to head
Gemini 2.5 FlashQwen2.5-Max
Aider polyglot
Gemini 2.5 Flash leads by +25.3
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Gemini 2.5 Flash
47.1
Qwen2.5-Max
21.8
Chatbot Arena Elo · Overall
Gemini 2.5 Flash leads by +36.9
Gemini 2.5 Flash
1411.0
Qwen2.5-Max
1374.2
Fiction.LiveBench
Qwen2.5-Max leads by +19.5
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
Gemini 2.5 Flash
47.2
Qwen2.5-Max
66.7
FrontierMath-2025-02-28-Private
Gemini 2.5 Flash leads by +3.8
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.
Gemini 2.5 Flash
4.8
Qwen2.5-Max
1.0
Lech Mazur Writing
Gemini 2.5 Flash leads by +3.6
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
Gemini 2.5 Flash
76.5
Qwen2.5-Max
72.9
OTIS Mock AIME 2024-2025
Gemini 2.5 Flash leads by +57.0
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Gemini 2.5 Flash
73.0
Qwen2.5-Max
16.0
Full benchmark table
BenchmarkGemini 2.5 FlashQwen2.5-Max
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
47.121.8
Chatbot Arena Elo · Overall
1411.01374.2
Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
47.266.7
FrontierMath-2025-02-28-Private
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.
4.81.0
Lech Mazur Writing
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
76.572.9
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
73.016.0
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Google DeepMind logoGemini 2.5 Flash$0.30$2.501.0M tokens (~524 books)$8.50
Alibaba Qwen logoQwen2.5-Max