Home/Models/Qwen2.5 72B Instruct
Alibaba Qwen logo

Qwen2.5 72B Instruct

by Alibaba Qwen · Released Sep 2024

Open Source
64.2
avg score
Rank #65
Compare
Better than 76% of all models
Context
131K tokens (~66 books)
Input $/1M
$0.36
Output $/1M
$0.40
Type
text
License
Open Source
Benchmarks
25 tested
Data updated today
About

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...

Tested on 25 benchmarks with 51.6% average. Top scores: Chatbot Arena Elo — Overall (1302.8%), ARC AI2 (92.7%), IFEval (86.4%).

Looking for similar performance at lower cost?
Gemma 4 31B scores 63.9 (100% as good) at $0.12/1M input · 67% cheaper
Capabilities
coding
40.7
#112 globally
reasoning
42.4
#78 globally
math
43.6
#115 globally
knowledge
59.9
#61 globally
agentic
5.3
#42 globally
general
61.9
#1 globally
multimodal
64.7
#2 globally
language
86.4
#29 globally
Benchmark Scores
Compare All
Tested on 25 benchmarks · Ranked across 9 categories
Score Distribution (all 274 models)
0255075100
▲ You are here
Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

65.4
WeirdML

Unusual and adversarial machine learning challenges. Tests robustness of reasoning about edge cases in ML systems.

16.0
BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

73.1
MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

11.7
MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

63.2
MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

59.8
OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

8.0
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
qwen-2-5-72b-instruct
Specifications
  • Typetext
  • Context131K tokens (~66 books)
  • ReleasedSep 2024
  • LicenseOpen Source
  • StatusActive
  • Cost / Message~$0.001
Available On
Alibaba Qwen logoAlibaba Qwen$0.36
Share & Export
Tweet
Qwen2.5 72B Instruct is an open-source text AI model by Alibaba Qwen, released in September 2024. It has an average benchmark score of 64.2. Context window: 131K tokens.