Compare · ModelsLive · 2 picked · head to head

Gemini 1.5 Pro (Feb 2024) vs Qwen2.5 72B Instruct

Side by side · benchmarks, pricing, and signals you can act on.

Winner summary

Qwen2.5 72B Instruct wins 6 of 10 shared benchmarks. Leads in coding · knowledge · math.

Category leads
coding·Qwen2.5 72B Instructarena·Gemini 1.5 Pro (Feb 2024)knowledge·Qwen2.5 72B Instructreasoning·Gemini 1.5 Pro (Feb 2024)math·Qwen2.5 72B Instructagentic·Qwen2.5 72B Instructmultimodal·Gemini 1.5 Pro (Feb 2024)
Hype vs Reality
Gemini 1.5 Pro (Feb 2024)
#138 by perf·no signal
QUIET
Qwen2.5 72B Instruct
#82 by perf·no signal
QUIET
Best value
Gemini 1.5 Pro (Feb 2024)
no price
Qwen2.5 72B Instruct
140.0 pts/$
$0.38/M
Vendor risk
Google DeepMind logo
Google DeepMind
$4.00T·Tier 1
Low risk
Alibaba Qwen logo
Alibaba (Qwen)
$293.0B·Tier 1
Low risk
Head to head
Gemini 1.5 Pro (Feb 2024)Qwen2.5 72B Instruct
Aider · Code Editing
Qwen2.5 72B Instruct leads by +8.3
Gemini 1.5 Pro (Feb 2024)
57.1
Qwen2.5 72B Instruct
65.4
Chatbot Arena Elo · Overall
Gemini 1.5 Pro (Feb 2024) leads by +20.2
Gemini 1.5 Pro (Feb 2024)
1322.5
Qwen2.5 72B Instruct
1302.3
Balrog
Gemini 1.5 Pro (Feb 2024) leads by +4.8
Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.
Gemini 1.5 Pro (Feb 2024)
21.0
Qwen2.5 72B Instruct
16.2
BBH
Gemini 1.5 Pro (Feb 2024) leads by +5.6
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
Gemini 1.5 Pro (Feb 2024)
78.7
Qwen2.5 72B Instruct
73.1
GPQA diamond
Qwen2.5 72B Instruct leads by +4.4
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
Gemini 1.5 Pro (Feb 2024)
27.8
Qwen2.5 72B Instruct
32.2
MATH level 5
Qwen2.5 72B Instruct leads by +22.4
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
Gemini 1.5 Pro (Feb 2024)
40.8
Qwen2.5 72B Instruct
63.2
MMLU
Qwen2.5 72B Instruct leads by +3.5
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
Gemini 1.5 Pro (Feb 2024)
76.9
Qwen2.5 72B Instruct
80.4
OTIS Mock AIME 2024-2025
Qwen2.5 72B Instruct leads by +1.3
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
Gemini 1.5 Pro (Feb 2024)
6.7
Qwen2.5 72B Instruct
8.0
The Agent Company
Qwen2.5 72B Instruct leads by +2.3
The Agent Company · tests AI agents on realistic corporate tasks like email management, code review, data analysis, and cross-tool workflows.
Gemini 1.5 Pro (Feb 2024)
3.4
Qwen2.5 72B Instruct
5.7
VideoMME
Gemini 1.5 Pro (Feb 2024) leads by +2.0
VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.
Gemini 1.5 Pro (Feb 2024)
66.7
Qwen2.5 72B Instruct
64.7
Full benchmark table
BenchmarkGemini 1.5 Pro (Feb 2024)Qwen2.5 72B Instruct
Aider · Code Editing
57.165.4
Chatbot Arena Elo · Overall
1322.51302.3
Balrog
Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.
21.016.2
BBH
BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.
78.773.1
GPQA diamond
Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.
27.832.2
MATH level 5
MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.
40.863.2
MMLU
Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
76.980.4
OTIS Mock AIME 2024-2025
OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.
6.78.0
The Agent Company
The Agent Company · tests AI agents on realistic corporate tasks like email management, code review, data analysis, and cross-tool workflows.
3.45.7
VideoMME
VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.
66.764.7
Pricing · per 1M tokens · projected $/mo at 10M tokens
ModelInputOutputContextProjected $/mo
Google DeepMind logoGemini 1.5 Pro (Feb 2024)
Alibaba Qwen logoQwen2.5 72B Instruct$0.36$0.4033K tokens (~16 books)$3.70