How much does Qwen2.5 72B Instruct cost?

Qwen2.5 72B Instruct costs $0.36 per million input tokens and $0.40 per million output tokens. For a typical conversation (~2,000 tokens), that's approximately $0.001 per message.

What benchmarks has Qwen2.5 72B Instruct been tested on?

Qwen2.5 72B Instruct has been evaluated on 24 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1302.3, ARC AI2: 92.7, IFEval: 86.4.

Is Qwen2.5 72B Instruct open source?

Yes, Qwen2.5 72B Instruct is open source.

How does Qwen2.5 72B Instruct compare to GPT-5.5?

Qwen2.5 72B Instruct has an average score of 65.8 while GPT-5.5 scores 65.8. GPT-5.5 slightly outperforms Qwen2.5 72B Instruct overall. Qwen2.5 72B Instruct costs $0.36/1M input vs GPT-5.5 at $5.00/1M input. See full comparison →

Home/Models/Qwen2.5 72B Instruct

Qwen2.5 72B Instruct

Name: Qwen2.5 72B Instruct
Price: 0.36 USD
Author: Alibaba Qwen

by Alibaba Qwen · Released Sep 2024

Open Source

65.8

avg score

Rank #51

Compare

Better than 78% of all models

Context

33K tokens (~16 books)

Input $/1M

$0.36

Output $/1M

$0.40

Type

text

License

Open Source

Benchmarks

24 tested

Data updated today

About

Qwen2.5 72B is the latest series of Qwen large language models. Qwen2.5 brings the following improvements upon Qwen2: - Significantly more knowledge and has greatly improved capabilities in coding and...

Tested on 24 benchmarks with 53.2% average. Top scores: Chatbot Arena Elo — Overall (1302.3%), ARC AI2 (92.7%), IFEval (86.4%).

Capabilities

coding

65.4

#27 globally

reasoning

42.4

#61 globally

math

43.6

#98 globally

knowledge

59.9

#48 globally

agentic

5.3

#32 globally

general

61.9

#1 globally

multimodal

64.7

#2 globally

language

86.4

#27 globally

Benchmark Scores

Compare All

Tested on 24 benchmarks · Ranked across 9 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider — Code Editing

Code editing benchmark from the Aider project. Measures ability to apply targeted code changes while maintaining correctness and style.

65.4—

reasoningCompare reasoning →

BBH

BIG-Bench Hard. 23 challenging tasks from BIG-Bench where prior language models fell below average human performance.

73.1—

MUSR

HuggingFace MuSR (Multi-Step Reasoning). Tests multi-hop reasoning requiring chaining multiple facts together.

11.7—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

63.2—

MATH Level 5

HuggingFace evaluation of MATH Level 5 problems. Competition math requiring advanced reasoning and proof construction.

59.8—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

8.0—

Quick compare:

vs GPT-5.5

vs GPT-5

vs Meta Llama 3 8B

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Alibaba Qwen Qwen 2.5

Qwen2.5 72B InstructSep 2024

53.2

$0.36/M in33Kctx24 benchmarks

Qwen2.5 7B InstructOct 2024

35.2-18.0

$0.04/M in(-0.32)33Kctx6 benchmarks

Qwen2.5 Coder 32B InstructNov 2024

53.1+17.9

$0.66/M in(+0.62)33Kctx14 benchmarks

Qwen2.5 Coder 7B InstructApr 2025

44.4-8.7

$0.03/M in(-0.63)33Kctx12 benchmarks

Qwen2.5 VL 32B InstructMar 2025

$0.20/M in(+0.17)128Kctx(+95K)

Qwen2.5 VL 72B InstructFeb 2025

$0.25/M in(+0.05)32Kctx(-96K)

Qwen2.5-MaxJan 2024

41.0+41.0

N/AN/Actx8 benchmarks

See the full Qwen 2.5 family →

Similar Models

Frequently Asked Questions

Qwen2.5 72B Instruct is an open-source text AI model by Alibaba Qwen, released in September 2024. It has an average benchmark score of 65.8. Context window: 33K tokens.

Benchmarks

Chatbot Arena Elo — Overall ARC AI2 IFEval CMMLU MMLU

Alibaba Qwen · Provider Alibaba Qwen · Economy All Models Compare Models Pricing Developers · API

Qwen2.5 72B Instruct

Frequently Asked Questions

Related Models

Benchmarks

Related Pages