Compare · ModelsLive · 2 picked · head to head

R1 vs Gemini 2.5 Flash

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Gemini 2.5 Flash wins on 6/11 benchmarks

Gemini 2.5 Flash wins 6 of 11 shared benchmarks. Leads in reasoning · arena · math.

5 / 11

Gemini 2.5 Flash

6 / 11

Category leads

coding·R1reasoning·Gemini 2.5 Flasharena·Gemini 2.5 Flashknowledge·R1math·Gemini 2.5 Flash

Hype vs Reality

Attention vs performance

#116 by perf·no signal

QUIET

Gemini 2.5 Flash

#144 by perf·#14 by attention

OVERHYPED

See full mindshare →

Best value

Gemini 2.5 Flash

1.0x better value than R1

28.2 pts/$

$1.60/M

Gemini 2.5 Flash

28.6 pts/$

$1.40/M

Explore pricing →

Vendor risk

Mixed exposure

One or more vendors flagged

DeepSeek

$3.4B·Tier 1

Higher risk

Google DeepMind

$4.00T·Tier 1

Low risk

See the AI economy →

Head to head

11 benchmarks · 2 models

R1Gemini 2.5 Flash

Aider polyglot

R1 leads by +9.8

Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.

56.9

Gemini 2.5 Flash

47.1

ARC-AGI

Gemini 2.5 Flash leads by +16.5

ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.

15.8

Gemini 2.5 Flash

32.3

ARC-AGI-2

Gemini 2.5 Flash leads by +1.2

ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.

1.3

Gemini 2.5 Flash

2.5

Chatbot Arena Elo · Overall

Gemini 2.5 Flash leads by +13.5

1397.5

Gemini 2.5 Flash

1411.0

Balrog

R1 leads by +1.4

Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.

34.9

Gemini 2.5 Flash

33.5

DeepResearch Bench

R1 leads by +5.9

DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.

35.1

Gemini 2.5 Flash

29.2

Fiction.LiveBench

R1 leads by +22.2

Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.

69.4

Gemini 2.5 Flash

47.2

Lech Mazur Writing

R1 leads by +6.5

Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.

83.0

Gemini 2.5 Flash

76.5

OTIS Mock AIME 2024-2025

Gemini 2.5 Flash leads by +19.7

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

53.3

Gemini 2.5 Flash

73.0

SimpleBench

Gemini 2.5 Flash leads by +12.4

SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.

17.1

Gemini 2.5 Flash

29.4

WeirdML

Gemini 2.5 Flash leads by +4.5

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

36.5

Gemini 2.5 Flash

41.0

Full benchmark table

Benchmark	R1	Gemini 2.5 Flash
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.	56.9	47.1
ARC-AGI ARC-AGI · the original Abstraction and Reasoning Corpus, testing whether AI can solve novel visual pattern recognition tasks without memorization.	15.8	32.3
ARC-AGI-2 ARC-AGI-2 · the second iteration of the Abstraction and Reasoning Corpus, testing novel pattern recognition and abstract reasoning without prior training data.	1.3	2.5
Chatbot Arena Elo · Overall	1397.5	1411.0
Balrog Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.	34.9	33.5
DeepResearch Bench DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.	35.1	29.2
Fiction.LiveBench Fiction.LiveBench · a continuously updated benchmark using recently published fiction to test reading comprehension and reasoning, preventing data contamination.	69.4	47.2
Lech Mazur Writing Lech Mazur Writing · evaluates creative writing ability, assessing prose quality, narrative coherence, and stylistic sophistication.	83.0	76.5
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	53.3	73.0
SimpleBench SimpleBench · tests fundamental reasoning capabilities with straightforward problems designed to expose gaps in basic logical and spatial thinking.	17.1	29.4
WeirdML WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.	36.5	41.0

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
R1	$0.70	$2.50	64K tokens (~32 books)	$11.50
Gemini 2.5 Flash	$0.30	$2.50	1.0M tokens (~524 books)	$8.50

People also compared

R1 vs o3 R1 vs o1 Gemini 2.5 Flash vs GPT-5 Mini Gemini 2.5 Flash vs Gemini 2.5 Pro