Qwen3.7-Max is the flagship model in Alibaba's Qwen3.7 series. It supports text input and output and is designed for agent-centric workloads, with particular strengths in coding, office and productivity tasks,...
Tested on 18 benchmarks with 68.1% average. Top scores: OTIS Mock AIME 2024-2025 (95.0%), GPQA diamond (88.8%), LiveBench — Mathematics (85.3%).
MiniMax M3 scores 79.7 (100% as good) at $0.30/1M input · 76% cheaper
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
- Typetext
- Context1.0M tokens (~500 books)
- ReleasedMay 2026
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.006