Compare · ModelsLive · 2 picked · head to head

R1 vs Gemini 2.5 Flash

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Gemini 2.5 Flash wins 6 of 11 shared benchmarks. Leads in reasoning · arena · math.

Category leads
coding·R1reasoning·Gemini 2.5 Flasharena·Gemini 2.5 Flashknowledge·R1math·Gemini 2.5 Flash
Hype vs Reality
R1
#116 by perf·no signal
QUIET
Gemini 2.5 Flash
#144 by perf·#14 by attention
OVERHYPED
Best value
1.0x better value than R1
R1
28.2 pts/$
$1.60/M
Gemini 2.5 Flash
28.6 pts/$
$1.40/M
Vendor risk
One or more vendors flagged
DeepSeek logo
DeepSeek
$3.4B·Tier 1
Higher risk
Google DeepMind logo
Google DeepMind
$4.00T·Tier 1
Low risk
Head to head
R1Gemini 2.5 Flash
Aider polyglot
R1 leads by +9.8
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
R1
56.9
Gemini 2.5 Flash
47.1
ARC-AGI
Gemini 2.5 Flash leads by +16.5
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
R1
15.8
Gemini 2.5 Flash
32.3
ARC-AGI-2
Gemini 2.5 Flash leads by +1.2
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
R1
1.3
Gemini 2.5 Flash
2.5
Chatbot Arena Elo · Overall
Gemini 2.5 Flash leads by +13.5
R1
1397.5
Gemini 2.5 Flash
1411.0
Balrog
R1 leads by +1.4
Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.
R1
34.9
Gemini 2.5 Flash
33.5
DeepResearch Bench
R1 leads by +5.9
DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.
R1
35.1
Gemini 2.5 Flash
29.2
Fiction.LiveBench
R1 leads by +22.2
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
R1
69.4
Gemini 2.5 Flash
47.2
Lech Mazur Writing
R1 leads by +6.5
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
R1
83.0
Gemini 2.5 Flash
76.5
OTIS Mock AIME 2024-2025
Gemini 2.5 Flash leads by +19.7
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
R1
53.3
Gemini 2.5 Flash
73.0
SimpleBench
Gemini 2.5 Flash leads by +12.4
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
R1
17.1
Gemini 2.5 Flash
29.4
WeirdML
Gemini 2.5 Flash leads by +4.5
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
R1
36.5
Gemini 2.5 Flash
41.0
Full benchmark table
BenchmarkR1Gemini 2.5 Flash
Aider polyglot
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
56.947.1
ARC-AGI
ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.
15.832.3
ARC-AGI-2
ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.
1.32.5
Chatbot Arena Elo · Overall
1397.51411.0
Balrog
Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.
34.933.5
DeepResearch Bench
DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.
35.129.2
Fiction.LiveBench
Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.
69.447.2
Lech Mazur Writing
Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.
83.076.5
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
53.373.0
SimpleBench
SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.
17.129.4
WeirdML
WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.
36.541.0
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
DeepSeek logoR1$0.70$2.5064K tokens (~32 books)$11.50
Google DeepMind logoGemini 2.5 Flash$0.30$2.501.0M tokens (~524 books)$8.50