Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...
Tested on 20 benchmarks with 61.6% average. Top scores: Chatbot Arena Elo — Overall (1369.0%), OpenCompass — IFEval (89.5%), OpenCompass — AIME2025 (89.0%).
Qwen2.5 7B Instruct scores 57.4 (100% as good) at $0.04/1M input · 59% cheaper
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
Stanford HELM evaluation of mathematical reasoning across diverse problem types.
- Typetext
- Context131K tokens (~66 books)
- ReleasedSep 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001