LIVETracking 971 AI models from 268 providers.

Models971·Providers268·Benchmarks128·Companies71·Agents165·TopQwen3 VL 235B A22B Instruct · 1415.8%·Updatedjust now·Data Points2,902·MCP Servers4,923

Benchmark · Knowledge

JMMLU

Updated 2025-01-20

Models tested

11

Top score

63.4

DeepSeek R1 Distill Qwen 14B

Median

42.3

min 24.2

Top-5 spread

σ 6.9

wide open

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on JMMLU rose from 44.7 to 63.4 in 9 months · +18.7 points · latest leader DeepSeek R1 Distill Qwen 14B from DeepSeek.

Pink dots = frontier records · 4 totalClick to open model page

Distribution

Where models cluster

Correlated benchmarks

Pearson r · original research

Correlation analysis

Benchmarks that track with JMMLU

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

LLM-JP · OverallKnowledge

JCommonsenseQAKnowledge

MMLU-PROKnowledge

MATH Level 5Knowledge

BBH (HuggingFace)Knowledge

Full rankings

11 models tested · sorted by score

#	Model	Score	Price	Bar
1	DeepSeek R1 Distill Qwen 14B· DeepSeek	63.4	—
2	Qwen2 7B Instruct· Alibaba	56.5	—
3	Qwen2 VL 7B Instruct· Alibaba	56.3	—
4	Meta Llama 3 8B Instruct· Meta	46.7	—
5	Meta Llama 3 8B· Meta	44.7	—
6	DeepSeek R1 Distill Qwen 7B· DeepSeek	42.3	—
7	Gemma 2 2b It· Google DeepMind	38.4	—
8	DeepSeek R1 Distill Llama 8B· DeepSeek	37.8	—
9	Llama 2 7b Chat Hf· Meta	33.3	—
10	Llama 2 7b Hf· Meta	28.6	—
11	HF SmolLM2 135M Instruct· Hugging Face TB	24.2	—

Frequently asked

Pulled from the JMMLU dataset · updated daily

What does JMMLU measure?

JMMLU is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 24.2 to 63.4 out of 100.

Which model leads on JMMLU?

DeepSeek R1 Distill Qwen 14B from DeepSeek leads JMMLU with a score of 63.4. The median score across 11 tested models is 42.3.

Is JMMLU saturated?

No · the top score is 63.4 out of 100 (63%). There is still meaningful room for improvement on JMMLU.

Does JMMLU predict performance on other benchmarks?

Yes · JMMLU scores correlate 0.89 with LLM-JP · Overall across 11 shared models. Models that do well on JMMLU tend to do well on LLM-JP · Overall.

How often is JMMLU data refreshed?

BenchGecko pulls updates daily. New model scores on JMMLU appear as soon as they are published by Epoch AI or the model provider.

Top on JMMLU

DeepSeek R1 Distill Qwen 14B · 63.4 Qwen2 7B Instruct · 56.5 Qwen2 VL 7B Instruct · 56.3 Meta Llama 3 8B Instruct · 46.7 Meta Llama 3 8B · 44.7

Related topics

Knowledge category All benchmarks Model leaderboard Methodology

Compare models

DeepSeek R1 Distill Qwen 14B vs Qwen2 7B Instruct Qwen2 7B Instruct vs Qwen2 VL 7B Instruct Qwen2 VL 7B Instruct vs Meta Llama 3 8B Instruct Meta Llama 3 8B Instruct vs Meta Llama 3 8B

More knowledge benchmarks

Same category · related evaluations

Chatbot Arena Elo · Overall

BBH (HuggingFace)

Artificial Analysis · Quality Index