Compare · ModelsLive · 2 picked · head to head
Mistral Large 2411 vs Gemini 2.5 Flash
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Gemini 2.5 Flash wins on 7/8 benchmarks
Gemini 2.5 Flash wins 7 of 8 shared benchmarks. Leads in arena · math · language.
Category leads
arena·Gemini 2.5 Flashmath·Gemini 2.5 Flashknowledge·Mistral Large 2411language·Gemini 2.5 Flashreasoning·Gemini 2.5 Flash
Hype vs Reality
Attention vs performance
Mistral Large 2411
#112 by perf·no signal
Gemini 2.5 Flash
#144 by perf·#14 by attention
Best value
Gemini 2.5 Flash
2.5x better value than Mistral Large 2411
Mistral Large 2411
11.4 pts/$
$4.00/M
Gemini 2.5 Flash
28.6 pts/$
$1.40/M
Vendor risk
Who is behind the model
Mistral AI
$14.0B·Tier 1
Google DeepMind
$4.00T·Tier 1
Head to head
8 benchmarks · 2 models
Mistral Large 2411Gemini 2.5 Flash
Chatbot Arena Elo · Overall
Gemini 2.5 Flash leads by +106.4
Mistral Large 2411
1304.7
Gemini 2.5 Flash
1411.0
FrontierMath-2025-02-28-Private
Gemini 2.5 Flash leads by +4.5
FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning.
Mistral Large 2411
0.3
Gemini 2.5 Flash
4.8
HELM · GPQA
Mistral Large 2411 leads by +4.5
Mistral Large 2411
43.5
Gemini 2.5 Flash
39.0
HELM · IFEval
Gemini 2.5 Flash leads by +2.2
Mistral Large 2411
87.6
Gemini 2.5 Flash
89.8
HELM · MMLU-Pro
Gemini 2.5 Flash leads by +4.0
Mistral Large 2411
59.9
Gemini 2.5 Flash
63.9
HELM · Omni-MATH
Gemini 2.5 Flash leads by +10.3
Mistral Large 2411
28.1
Gemini 2.5 Flash
38.4
HELM · WildBench
Gemini 2.5 Flash leads by +1.6
Mistral Large 2411
80.1
Gemini 2.5 Flash
81.7
OTIS Mock AIME 2024-2025
Gemini 2.5 Flash leads by +65.3
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Mistral Large 2411
7.7
Gemini 2.5 Flash
73.0
Full benchmark table
| Benchmark | Mistral Large 2411 | Gemini 2.5 Flash |
|---|---|---|
Chatbot Arena Elo · Overall | 1304.7 | 1411.0 |
FrontierMath-2025-02-28-Private FrontierMath (Feb 2025) · original research-level math problems created by mathematicians, testing capabilities at the boundary of current AI mathematical reasoning. | 0.3 | 4.8 |
HELM · GPQA | 43.5 | 39.0 |
HELM · IFEval | 87.6 | 89.8 |
HELM · MMLU-Pro | 59.9 | 63.9 |
HELM · Omni-MATH | 28.1 | 38.4 |
HELM · WildBench | 80.1 | 81.7 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 7.7 | 73.0 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $2.00 | $6.00 | 131K tokens (~66 books) | $30.00 | |
| $0.30 | $2.50 | 1.0M tokens (~524 books) | $8.50 |