Compare · ModelsLive · 2 picked · head to head
Gemini 1.5 Pro (Feb 2024) vs Qwen2.5 72B Instruct
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Qwen2.5 72B Instruct wins on 6/10 benchmarks
Qwen2.5 72B Instruct wins 6 of 10 shared benchmarks. Leads in coding · knowledge · math.
Category leads
coding·Qwen2.5 72B Instructarena·Gemini 1.5 Pro (Feb 2024)knowledge·Qwen2.5 72B Instructreasoning·Gemini 1.5 Pro (Feb 2024)math·Qwen2.5 72B Instructagentic·Qwen2.5 72B Instructmultimodal·Gemini 1.5 Pro (Feb 2024)
Hype vs Reality
Attention vs performance
Gemini 1.5 Pro (Feb 2024)
#138 by perf·no signal
Qwen2.5 72B Instruct
#82 by perf·no signal
Best value
Qwen2.5 72B Instruct
Gemini 1.5 Pro (Feb 2024)
—
no price
Qwen2.5 72B Instruct
140.0 pts/$
$0.38/M
Vendor risk
Who is behind the model
Google DeepMind
$4.00T·Tier 1
Alibaba (Qwen)
$293.0B·Tier 1
Head to head
10 benchmarks · 2 models
Gemini 1.5 Pro (Feb 2024)Qwen2.5 72B Instruct
Aider · Code Editing
Qwen2.5 72B Instruct leads by +8.3
Gemini 1.5 Pro (Feb 2024)
57.1
Qwen2.5 72B Instruct
65.4
Chatbot Arena Elo · Overall
Gemini 1.5 Pro (Feb 2024) leads by +20.2
Gemini 1.5 Pro (Feb 2024)
1322.5
Qwen2.5 72B Instruct
1302.3
Balrog
Gemini 1.5 Pro (Feb 2024) leads by +4.8
Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.
Gemini 1.5 Pro (Feb 2024)
21.0
Qwen2.5 72B Instruct
16.2
BBH
Gemini 1.5 Pro (Feb 2024) leads by +5.6
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
Gemini 1.5 Pro (Feb 2024)
78.7
Qwen2.5 72B Instruct
73.1
GPQA diamond
Qwen2.5 72B Instruct leads by +4.4
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Gemini 1.5 Pro (Feb 2024)
27.8
Qwen2.5 72B Instruct
32.2
MATH level 5
Qwen2.5 72B Instruct leads by +22.4
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
Gemini 1.5 Pro (Feb 2024)
40.8
Qwen2.5 72B Instruct
63.2
MMLU
Qwen2.5 72B Instruct leads by +3.5
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
Gemini 1.5 Pro (Feb 2024)
76.9
Qwen2.5 72B Instruct
80.4
OTIS Mock AIME 2024-2025
Qwen2.5 72B Instruct leads by +1.3
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Gemini 1.5 Pro (Feb 2024)
6.7
Qwen2.5 72B Instruct
8.0
The Agent Company
Qwen2.5 72B Instruct leads by +2.3
The Agent Company · tests AI agents on realistic corporate tasks like email management, code review, data analysis, and cross-tool workflows.
Gemini 1.5 Pro (Feb 2024)
3.4
Qwen2.5 72B Instruct
5.7
VideoMME
Gemini 1.5 Pro (Feb 2024) leads by +2.0
VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.
Gemini 1.5 Pro (Feb 2024)
66.7
Qwen2.5 72B Instruct
64.7
Full benchmark table
| Benchmark | Gemini 1.5 Pro (Feb 2024) | Qwen2.5 72B Instruct |
|---|---|---|
Aider · Code Editing | 57.1 | 65.4 |
Chatbot Arena Elo · Overall | 1322.5 | 1302.3 |
Balrog Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning. | 21.0 | 16.2 |
BBH BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans. | 78.7 | 73.1 |
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs. | 27.8 | 32.2 |
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics. | 40.8 | 63.2 |
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge. | 76.9 | 80.4 |
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills. | 6.7 | 8.0 |
The Agent Company The Agent Company · tests AI agents on realistic corporate tasks like email management, code review, data analysis, and cross-tool workflows. | 3.4 | 5.7 |
VideoMME VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension. | 66.7 | 64.7 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| — | — | — | — | |
| $0.36 | $0.40 | 33K tokens (~16 books) | $3.70 |