Qwen3-Next-80B-A3B-Thinking is a reasoning-first chat model in the Qwen3-Next line that outputs structured “thinking” traces by default. It’s designed for hard multi-step problems; math proofs, code synthesis/debugging, logic, and agentic...
Tested on 20 benchmarks with 61.6% average. Top scores: Chatbot Arena Elo — Overall (1369.5%), OpenCompass — IFEval (89.5%), OpenCompass — AIME2025 (89.0%).
Qwen2.5 Coder 7B Instruct scores 56.6 (100% as good) at $0.03/1M input · 69% cheaper
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Stanford HELM WildBench evaluation. Tests reasoning on challenging real-world tasks.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
Stanford HELM evaluation of mathematical reasoning across diverse problem types.
- Typetext
- Context262K tokens (~131 books)
- ReleasedSep 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001