Kimi K2 Thinking
Open Sourcevon moonshotai · Veroeffentlicht 2025-11-06
53.3
Durchschn. Score
$0.60/1M
Eingabepreis
$2.50/1M
Ausgabepreis
262K tokens (~131 books)
Kontextfenster
text
Typ
Tested on 25 benchmarks with 53.3% average. Top scores: OpenCompass — AIME2025 (94.1%), OpenCompass — IFEval (92.4%), OpenCompass — MMLU-Pro (84.3%).
Benchmark-Ergebnisse
| Benchmark | Kategorie | Score | Bar |
|---|---|---|---|
| OpenCompass — AIME2025 | math | 94.1 | |
| OpenCompass — IFEval | language | 92.4 | |
| OpenCompass — MMLU-Pro | knowledge | 84.3 | |
| OTIS Mock AIME 2024-2025 | math | 83.0 | |
| OpenCompass — GPQA-Diamond | knowledge | 82.7 | |
| LiveBench — Mathematics | math | 81.1 | |
| GPQA diamond | knowledge | 79.0 | |
| OpenCompass — LiveCodeBenchV6 | coding | 77.1 | |
| LiveBench — Coding | coding | 67.4 | |
| LiveBench — Language | language | 66.5 | |
| LiveBench — Reasoning | reasoning | 63.5 | |
| SWE-Bench Verified (Bash Only) | coding | 63.4 | |
| LiveBench — If | language | 62.0 | |
| LiveBench — Overall | knowledge | 61.6 | |
| LiveBench — Data Analysis | reasoning | 52.3 | |
| WeirdML | coding | 42.8 | |
| LiveBench — Agentic Coding | coding | 38.3 | |
| Terminal Bench | coding | 35.7 | |
| SimpleQA Verified | knowledge | 31.6 | |
| FrontierMath-2025-02-28-Private | math | 21.4 | |
| OpenCompass — HLE | knowledge | 21.3 | |
| Chess Puzzles | knowledge | 20.0 | |
| PostTrainBench | knowledge | 7.3 | |
| APEX-Agents | agentic | 4.0 | |
| FrontierMath-Tier-4-2025-07-01-Private | math | 0.1 |