Trends

Key metrics across the AI model ecosystem.

Top Average Score
78.0%
Claude Instant leads across 4 benchmarks
Average Input Price
$-5761.87/M
Per million tokens across all tracked models
Open Source Share
57%
236 of 414 models are open source
Max Context Window
2.0M
Largest context window available
Average Context
219K
Mean context window across all models
Active Providers
55
API providers with at least one active model

Benchmark Leaders

BenchmarkLeaderScore
ARC AI2DeepSeek V393.7
BBHDeepSeek V383.3
GSM8KGPT-4o-mini (2024-07-18)91.3
HellaSwagLlama 3.1-405B85.6
LAMBADAFalcon-180B79.8
MMLUGPT-4o (2024-11-20)84.1
GPQA diamondGemini 3.1 Pro Preview92.1
MATH level 5GPT-5 Chat98.1
OTIS Mock AIME 2024-2025GPT-5.2 Chat96.1
WeirdMLClaude Opus 4.677.9
WinograndeLlama 3.1-405B78.4
SimpleBenchGemini 3.1 Pro Preview75.5
Aider polyglotGPT-5 Chat88.0
Lech Mazur WritingKimi K2 090587.3
GSO-BenchClaude Opus 4.633.3
Fiction.LiveBenchGPT-5 Chat97.2
SWE-Bench Verified (Bash Only)Claude Opus 4.574.4
Terminal BenchGemini 3.1 Pro Preview78.4
FrontierMath-2025-02-28-PrivateGPT-5.4 Pro50.0
SimpleQA VerifiedGemini 3.1 Pro Preview77.3
FrontierMath-Tier-4-2025-07-01-PrivateGPT-5.4 Pro37.5
Chess PuzzlesGemini 3.1 Pro Preview55.0
APEX-AgentsGPT-5.435.9
OSWorldClaude Opus 4.566.3
ARC-AGI-2GPT-5.4 Pro83.3
HLEGemini 3 Pro34.4
TriviaQALlama 2-70B87.6
ScienceQAClaude 3 Haiku62.7
PIQAGPT-4o-mini (2024-07-18)77.4
OpenBookQAphi-3-mini 3.8B84.0
CadEvalo374.0
BalrogGemini 3 Flash Preview48.1
GeoBenchGemini 3 Flash Preview88.0
CybenchClaude Sonnet 4.555.0
ANLIphi-3-small 7.4B37.1
The Agent CompanyDeepSeek V3.2 Exp42.9
VideoMMEGemini 1.5 Pro (Feb 2024)66.7
ARC-AGIGemini 3.1 Pro Preview98.0
DeepResearch BenchClaude Sonnet 4.552.6
VPCTGemini 3 Pro86.5