Compare · ModelsLive · 2 picked · head to head

Gemini 1.5 Pro (Feb 2024) vs Qwen2.5 72B Instruct

Side by side · benchmarks, pricing, and signals you can act on.

CiteAdd another

Winner summary

Qwen2.5 72B Instruct wins on 6/10 benchmarks

Qwen2.5 72B Instruct wins 6 of 10 shared benchmarks. Leads in coding · knowledge · math.

Gemini 1.5 Pro (Feb 2024)

4 / 10

Qwen2.5 72B Instruct

6 / 10

Category leads

coding·Qwen2.5 72B Instructarena·Gemini 1.5 Pro (Feb 2024)knowledge·Qwen2.5 72B Instructreasoning·Gemini 1.5 Pro (Feb 2024)math·Qwen2.5 72B Instructagentic·Qwen2.5 72B Instructmultimodal·Gemini 1.5 Pro (Feb 2024)

Hype vs Reality

Attention vs performance

Gemini 1.5 Pro (Feb 2024)

#138 by perf·no signal

QUIET

Qwen2.5 72B Instruct

#82 by perf·no signal

QUIET

See full mindshare →

Best value

Qwen2.5 72B Instruct

Gemini 1.5 Pro (Feb 2024)

—

no price

Qwen2.5 72B Instruct

140.0 pts/$

$0.38/M

Explore pricing →

Vendor risk

Who is behind the model

Google DeepMind

$4.00T·Tier 1

Low risk

Alibaba (Qwen)

$293.0B·Tier 1

Low risk

See the AI economy →

Head to head

10 benchmarks · 2 models

Gemini 1.5 Pro (Feb 2024)Qwen2.5 72B Instruct

Aider · Code Editing

Qwen2.5 72B Instruct leads by +8.3

Gemini 1.5 Pro (Feb 2024)

57.1

Qwen2.5 72B Instruct

65.4

Chatbot Arena Elo · Overall

Gemini 1.5 Pro (Feb 2024) leads by +20.2

Gemini 1.5 Pro (Feb 2024)

1322.5

Qwen2.5 72B Instruct

1302.3

Balrog

Gemini 1.5 Pro (Feb 2024) leads by +4.8

Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.

Gemini 1.5 Pro (Feb 2024)

21.0

Qwen2.5 72B Instruct

16.2

BBH

Gemini 1.5 Pro (Feb 2024) leads by +5.6

BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.

Gemini 1.5 Pro (Feb 2024)

78.7

Qwen2.5 72B Instruct

73.1

GPQA diamond

Qwen2.5 72B Instruct leads by +4.4

Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.

Gemini 1.5 Pro (Feb 2024)

27.8

Qwen2.5 72B Instruct

32.2

MATH level 5

Qwen2.5 72B Instruct leads by +22.4

MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.

Gemini 1.5 Pro (Feb 2024)

40.8

Qwen2.5 72B Instruct

63.2

MMLU

Qwen2.5 72B Instruct leads by +3.5

Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.

Gemini 1.5 Pro (Feb 2024)

76.9

Qwen2.5 72B Instruct

80.4

OTIS Mock AIME 2024-2025

Qwen2.5 72B Instruct leads by +1.3

OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.

Gemini 1.5 Pro (Feb 2024)

6.7

Qwen2.5 72B Instruct

8.0

The Agent Company

Qwen2.5 72B Instruct leads by +2.3

The Agent Company · tests AI agents on realistic corporate tasks like email management, code review, data analysis, and cross-tool workflows.

Gemini 1.5 Pro (Feb 2024)

3.4

Qwen2.5 72B Instruct

5.7

VideoMME

Gemini 1.5 Pro (Feb 2024) leads by +2.0

VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.

Gemini 1.5 Pro (Feb 2024)

66.7

Qwen2.5 72B Instruct

64.7

Full benchmark table

Benchmark	Gemini 1.5 Pro (Feb 2024)	Qwen2.5 72B Instruct
Aider · Code Editing	57.1	65.4
Chatbot Arena Elo · Overall	1322.5	1302.3
Balrog Balrog · benchmarks AI agents on text-based adventure games, testing language understanding, strategic planning, and long-horizon reasoning.	21.0	16.2
BBH BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.	78.7	73.1
GPQA diamond Graduate-Level Google-Proof QA (Diamond set) · expert-crafted questions in physics, biology, and chemistry that are difficult even for domain PhDs.	27.8	32.2
MATH level 5 MATH Level 5 · the hardest tier of the MATH benchmark, featuring competition-level problems from AMC, AIME, and Olympiad-style mathematics.	40.8	63.2
MMLU Massive Multitask Language Understanding · 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.	76.9	80.4
OTIS Mock AIME 2024-2025 OTIS Mock AIME 2024-2025 · simulated American Invitational Mathematics Examination problems testing advanced problem-solving skills.	6.7	8.0
The Agent Company The Agent Company · tests AI agents on realistic corporate tasks like email management, code review, data analysis, and cross-tool workflows.	3.4	5.7
VideoMME VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.	66.7	64.7

Pricing · per 1M tokens · projected $/mo at 10M tokens

Model	Input	Output	Context	Projected $/mo
Gemini 1.5 Pro (Feb 2024)	—	—	—	—
Qwen2.5 72B Instruct	$0.36	$0.40	33K tokens (~16 books)	$3.70