Kimi K2 Thinking is Moonshot AI’s most advanced open reasoning model to date, extending the K2 series into agentic, long-horizon reasoning. Built on the trillion-parameter Mixture-of-Experts (MoE) architecture introduced in...
Tested on 25 benchmarks with 53.3% average. Top scores: OpenCompass — AIME2025 (94.1%), OpenCompass — IFEval (92.4%), OpenCompass — MMLU-Pro (84.3%).
gpt-oss-20b (free) scores 61.0 (100% as good) at $0.00/1M input · 100% cheaper
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Regularly refreshed coding problems that avoid data contamination. New problems added monthly to prevent memorization.
SWE-bench Verified solved using only bash commands, no specialized frameworks. Tests raw terminal-based problem solving.
Regularly refreshed reasoning problems testing logical deduction, spatial reasoning, and analytical thinking.
Fresh data analysis tasks testing ability to interpret tables, charts, and statistical data.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
Mock AIME (American Invitational Mathematics Exam) problems from OTIS. Tests mathematical competition performance.
Regularly updated math problems that test numerical reasoning, algebra, calculus, and combinatorics.
- Typetext
- Context262K tokens (~131 books)
- ReleasedNov 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.004