Qwen3.6-Max-Preview is a proprietary frontier model from Alibaba Cloud built on a sparse mixture-of-experts architecture with approximately 1 trillion total parameters. It is optimized for agentic coding, tool use, and...
Tested on 10 benchmarks with 51.2% average. Top scores: Chatbot Arena Elo — Coding (1482.8%), Chatbot Arena Elo — Overall (1460.7%), OTIS Mock AIME 2024-2025 (91.1%).
Kimi K2 Thinking scores 59.1 (99% as good) at $0.60/1M input · 42% cheaper
Real-world software engineering tasks from GitHub issues. Models must diagnose bugs and write patches that pass test suites. Human-verified subset of SWE-bench.
Deceptively simple questions that humans find easy but AI models often get wrong. Tests common sense and reasoning gaps.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Original research-level math problems created by professional mathematicians. Problems are unpublished and cannot be memorized.
Hardest tier of FrontierMath. Problems at the frontier of human mathematical ability, many unsolved by most mathematicians.
- Typetext
- Context262K tokens (~131 books)
- ReleasedApr 2026
- LicenseOpen Source
- Statuspreview
- Cost / Message~$0.008