Home/Models/Qwen3 Next 80B A3B Thinking
Alibaba Qwen logo

Qwen3 Next 80B A3B Thinking

by Alibaba Qwen · Released Sep 2025

Open Source
57.5
avg score
Rank #78
Compare
Better than 67% of all models
Context
131K tokens (~66 books)
Input $/1M
$0.10
Output $/1M
$0.78
Type
text
License
Open Source
Benchmarks
20 tested
Data updated today
About

Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...

Tested on 20 benchmarks with 61.6% average. Top scores: Chatbot Arena Elo — Overall (1369.0%), OpenCompass — IFEval (89.5%), OpenCompass — AIME2025 (89.0%).

Looking for similar performance at lower cost?
Qwen2.5 7B Instruct scores 57.4 (100% as good) at $0.04/1M input · 59% cheaper
Capabilities
coding
45.1
#83 globally
reasoning
64.1
#30 globally
math
70.0
#32 globally
knowledge
60.8
#45 globally
language
67.1
#75 globally
Benchmark Scores
Compare All
Tested on 20 benchmarks · Ranked across 6 categories
Score Distribution (all 233 models)
0255075100
▲ You are here
OpenCompass — LiveCodeBenchV6

OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.

66.3
LiveBench — Coding

Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.

60.7
LiveBench — Agentic Coding

LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.

8.3
HELM — WildBench

Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.

80.7
LiveBench — Reasoning

Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.

58.2
LiveBench — Data Analysis

Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.

53.6
OpenCompass — AIME2025

OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.

89.0
LiveBench — Mathematics

Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.

74.3
HELM — Omni-MATH

Stanford HELM evaluation of mathematical reasoning across diverse problem types.

46.7
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Model Family · Alibaba Qwen Qwen 3
See the full Qwen 3 family →
Links
Documentation
Community
BenchGecko API
qwen3-next-80b-a3b-thinking
Specifications
  • Typetext
  • Context131K tokens (~66 books)
  • ReleasedSep 2025
  • LicenseOpen Source
  • StatusActive
  • Cost / Message~$0.001
Available On
Alibaba Qwen logoAlibaba Qwen$0.10
Share & Export
Tweet
Qwen3 Next 80B A3B Thinking is an open-source text AI model by Alibaba Qwen, released in September 2025. It has an average benchmark score of 57.5. Context window: 131K tokens.