How much does Qwen2.5-Max cost?

Qwen2.5-Max is open source and can be self-hosted.

What benchmarks has Qwen2.5-Max been tested on?

Qwen2.5-Max has been evaluated on 8 benchmarks. Top scores: Chatbot Arena Elo — Overall: 1374.2, Lech Mazur Writing: 72.9, MATH level 5: 67.2.

Is Qwen2.5-Max open source?

Yes, Qwen2.5-Max is open source.

How does Qwen2.5-Max compare to Llama 3.2 3B Instruct?

Qwen2.5-Max has an average score of 36.0 while Llama 3.2 3B Instruct scores 35.9. Qwen2.5-Max outperforms Llama 3.2 3B Instruct overall. See full comparison →

Home/Models/Qwen2.5-Max

Qwen2.5-Max

Name: Qwen2.5-Max
Author: Alibaba Qwen

by Alibaba Qwen · Released Jan 2024

Open Source

36.0

avg score

Rank #161

Compare

Better than 31% of all models

Context

N/A

Input $/1M

TBD

Output $/1M

TBD

Type

text

License

Open Source

Benchmarks

8 tested

Data updated today

About

Tested on 8 benchmarks with 41.0% average. Top scores: Chatbot Arena Elo — Overall (1374.2%), Lech Mazur Writing (72.9%), MATH level 5 (67.2%).

Capabilities

coding

21.8

#124 globally

math

28.1

#138 globally

knowledge

60.4

#47 globally

Benchmark Scores

Compare All

Tested on 8 benchmarks · Ranked across 4 categories

Score Distribution (all 233 models)

0255075100

▲ You are here

codingCompare coding →

Aider polyglot

Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.

21.8—

mathCompare math →

MATH level 5

Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.

67.2—

OTIS Mock AIME 2024-2025

Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.

16.0—

FrontierMath-2025-02-28-Private

Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.

1.0—

knowledgeCompare knowledge →

Lech Mazur Writing

Writing quality evaluation by Lech Mazur. Tests prose quality, coherence, and stylistic ability.

72.9—

Fiction.LiveBench

LiveBench fiction analysis. Tests literary comprehension and creative text understanding.

66.7—

GPQA diamond

Graduate-level science questions written by PhD experts. Diamond subset contains questions where experts disagree, testing deep understanding.

41.5—

Quick compare:

vs Llama 3.2 3B Instruct

vs GPT-4o (2024-11-20)

vs Mistral Large 2411

Excellent (85+) Good (70-85) Average (50-70) Below (<50)

Model Family · Alibaba Qwen Qwen 2.5

Qwen2.5 72B InstructSep 2024

53.2

$0.36/M in33Kctx24 benchmarks

Qwen2.5 7B InstructOct 2024

35.2-18.0

$0.04/M in(-0.32)33Kctx6 benchmarks

Qwen2.5 Coder 32B InstructNov 2024

53.1+17.9

$0.66/M in(+0.62)33Kctx14 benchmarks

Qwen2.5 Coder 7B InstructApr 2025

44.4-8.7

$0.03/M in(-0.63)33Kctx12 benchmarks

Qwen2.5 VL 32B InstructMar 2025

$0.20/M in(+0.17)128Kctx(+95K)

Qwen2.5 VL 72B InstructFeb 2025

$0.25/M in(+0.05)32Kctx(-96K)

Qwen2.5-MaxJan 2024

41.0+41.0

N/AN/Actx8 benchmarks

See the full Qwen 2.5 family →

Similar Models

Llama 3.2 3B Instruct

Frequently Asked Questions

Qwen2.5-Max is an open-source text AI model by Alibaba Qwen, released in January 2024. It has an average benchmark score of 36.0.

Benchmarks

Chatbot Arena Elo — Overall Lech Mazur Writing MATH level 5 Fiction.LiveBench GPQA diamond

Alibaba Qwen · Provider Alibaba Qwen · Economy All Models Compare Models Pricing Developers · API

Qwen2.5-Max

Frequently Asked Questions

Related Models

Benchmarks

Related Pages