LIVETracking 971 AI models from 268 providers.

Models971·Providers268·Benchmarks128·Companies71·Agents165·TopQwen3 VL 235B A22B Instruct · 1415.8%·Updated2h·Data Points2,902·MCP Servers4,923

Benchmark · Knowledge

JNLI

Updated 2025-01-20

Models tested

11

Top score

82.4

DeepSeek R1 Distill Qwen 14B

Median

60.9

min 35.6

Top-5 spread

σ 7.9

wide open

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on JNLI rose from 60.9 to 82.4 in 9 months · +21.5 points · latest leader DeepSeek R1 Distill Qwen 14B from DeepSeek.

Pink dots = frontier records · 4 totalClick to open model page

Distribution

Where models cluster

Correlated benchmarks

Pearson r · original research

Correlation analysis

Benchmarks that track with JNLI

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

MATH Level 5Knowledge

JCommonsenseQAKnowledge

MMLU-PROKnowledge

BBH (HuggingFace)Knowledge

LLM-JP · OverallKnowledge

Full rankings

11 models tested · sorted by score

#	Model	Score	Price	Bar
1	DeepSeek R1 Distill Qwen 14B· DeepSeek	82.4	—
2	Qwen2 7B Instruct· Alibaba	81.3	—
3	Qwen2 VL 7B Instruct· Alibaba	74.4	—
4	DeepSeek R1 Distill Llama 8B· DeepSeek	69.4	—
5	Meta Llama 3 8B Instruct· Meta	61.1	—
6	Meta Llama 3 8B· Meta	60.9	—
7	Gemma 2 2b It· Google DeepMind	57.1	—
8	HF SmolLM2 135M Instruct· Hugging Face TB	55.3	—
9	DeepSeek R1 Distill Qwen 7B· DeepSeek	54.6	—
10	Llama 2 7b Hf· Meta	36.1	—
11	Llama 2 7b Chat Hf· Meta	35.6	—

Frequently asked

Pulled from the JNLI dataset · updated daily

What does JNLI measure?

JNLI is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 35.6 to 82.4 out of 100.

Which model leads on JNLI?

DeepSeek R1 Distill Qwen 14B from DeepSeek leads JNLI with a score of 82.4. The median score across 11 tested models is 60.9.

Is JNLI saturated?

No · the top score is 82.4 out of 100 (82%). There is still meaningful room for improvement on JNLI.

Does JNLI predict performance on other benchmarks?

Yes · JNLI scores correlate 0.82 with JMMLU across 11 shared models. Models that do well on JNLI tend to do well on JMMLU.

How often is JNLI data refreshed?

BenchGecko pulls updates daily. New model scores on JNLI appear as soon as they are published by Epoch AI or the model provider.

Top on JNLI

DeepSeek R1 Distill Qwen 14B · 82.4 Qwen2 7B Instruct · 81.3 Qwen2 VL 7B Instruct · 74.4 DeepSeek R1 Distill Llama 8B · 69.4 Meta Llama 3 8B Instruct · 61.1

Related topics

Knowledge category All benchmarks Model leaderboard Methodology

Compare models

DeepSeek R1 Distill Qwen 14B vs Qwen2 7B Instruct Qwen2 7B Instruct vs Qwen2 VL 7B Instruct Qwen2 VL 7B Instruct vs DeepSeek R1 Distill Llama 8B DeepSeek R1 Distill Llama 8B vs Meta Llama 3 8B Instruct

More knowledge benchmarks

Same category · related evaluations

Chatbot Arena Elo · Overall

BBH (HuggingFace)

Artificial Analysis · Quality Index