MiniMax-M2.5 is a SOTA large language model designed for real-world productivity. Trained in a diverse range of complex real-world digital working environments, M2.5 builds upon the coding expertise of M2.1...
Tested on 21 benchmarks with 55.1% average. Top scores: Chatbot Arena Elo — Overall (1404.4%), Chatbot Arena Elo — Coding (1396.3%), OpenCompass — IFEval (91.1%).
gpt-oss-20b (free) scores 61.0 (99% as good) at $0.00/1M input · 100% cheaper
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
LiveBench coding tasks that require multi-step reasoning and tool use. Tests planning and execution of complex coding workflows.
Abstraction and Reasoning Corpus. Tests fluid intelligence through novel visual pattern recognition puzzles. Core measure of general intelligence.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
- Typetext
- Context197K tokens (~98 books)
- ReleasedFeb 2026
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.001