AI Capabilities
Which model is best for what? Capability rankings derived from 40 benchmarks across 5 skill areas.
Coding
7 benchmarksΒ·109 models tested
Code generation, debugging, and software engineering tasks β from writing functions to fixing real-world GitHub issues.
1Gemini 3.1 Pro Preview
75.3
2DeepSeek V3.2
74.2
3o3 Pro
71.6
4Claude Sonnet 4.6
66.1
5GPT-5 Pro
60.4
109 models rankedView all β
Math & Reasoning
9 benchmarksΒ·165 models tested
Mathematical problem solving, logical reasoning, and multi-step inference β from arithmetic to competition-level mathematics.
1Qwen2.5 Coder 32B Instruct
91.1
2Qwen2.5 Coder 7B Instruct
86.7
3Claude Instant
86.7
4Qwen3 Max
85.2
5Gemini 2.0 Pro
83.5
165 models rankedView all β
Knowledge & QA
20 benchmarksΒ·170 models tested
Factual knowledge, question answering, and academic reasoning β tested across science, history, medicine, law, and more.
1o3 Pro
91.7
2Grok 4 Fast
87.8
3DeepSeek V3.2 Exp
83.3
4DeepSeek-V2 (MoE-236B, May 2024)
77.3
5Kimi K2 0905
77.0
170 models rankedView all β
Agentic Tasks
3 benchmarksΒ·33 models tested
Autonomous task execution, tool use, and multi-step planning β the frontier of AI agents working independently.
1Claude Sonnet 4.5
62.9
2DeepSeek V3.2 Exp
42.9
3Claude Opus 4.5
42.3
4Kimi K2.5
38.9
5Claude Sonnet 4
38.5
33 models rankedView all β
Multimodal
1 benchmarksΒ·11 models tested
Vision understanding, image analysis, and cross-modal reasoning β processing both text and visual inputs.
1Gemini 1.5 Pro (Feb 2024)
66.7
2Qwen2.5 72B Instruct
64.7
3GPT-4o (2024-11-20)
62.5
4GPT-4o (2024-08-06)
62.5
5GPT-4o (2024-05-13)
62.5
11 models rankedView all β