Qwen3-Max is an updated release built on the Qwen3 series, offering major improvements in reasoning, instruction following, multilingual support, and long-tail knowledge coverage compared to the January 2025 version. It...
Tested on 8 benchmarks with 58.3% average. Top scores: MATH level 5 (97.1%), Lech Mazur Writing (87.1%), OTIS Mock AIME 2024-2025 (73.3%).
gpt-oss-20b (free) scores 61.0 (100% as good) at $0.00/1M input · 100% cheaper
Competition-level math from AMC, AIME, and olympiad problems. Level 5 is the hardest tier, requiring creative problem-solving.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Writing quality evaluation by Lech Mazur. Tests prose quality, coherence, and stylistic ability.
Simple factual questions with verified correct answers. Tests accuracy of basic knowledge retrieval. Low scores indicate hallucination.
LiveBench fiction analysis. Tests literary comprehension and creative text understanding.
- Typetext
- Context262K tokens (~131 books)
- ReleasedSep 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.005