JNLI
The Frontier
Best score over time · one chart, every benchmark
Full rankings
11 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 82.4 | |
| 2 | 81.3 | |
| 3 | 74.4 | |
| 4 | 69.4 | |
| 5 | 61.1 | |
| 6 | 60.9 | |
| 7 | 57.1 | |
| 8 | HF SmolLM2 135M Instruct | 55.3 |
| 9 | 54.6 | |
| 10 | 36.1 | |
| 11 | 35.6 |
Score distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with JNLI
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Frequently asked
About JNLI
What does JNLI measure?
JNLI is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 35.6 to 82.4 out of 100.
Which model leads on JNLI?
DeepSeek R1 Distill Qwen 14B from DeepSeek leads JNLI with a score of 82.4. The median score across 11 tested models is 60.9.
Is JNLI saturated?
No · the top score is 82.4 out of 100 (82%). There is still meaningful room for improvement on JNLI.
Does JNLI predict performance on other benchmarks?
Yes · JNLI scores correlate 0.82 with JMMLU across 11 shared models. Models that do well on JNLI tend to do well on JMMLU.
How often is JNLI data refreshed?
BenchGecko pulls updates daily. New model scores on JNLI appear as soon as they are published by Epoch AI or the model provider.
- Category
- Knowledge
- Max score
- 100
- Models
- 11
- Updated
- 2025-01-20
Top on JNLI
DeepSeek R1 Distill Qwen 14B · 82.4Qwen2 7B Instruct · 81.3Qwen2 VL 7B Instruct · 74.4DeepSeek R1 Distill Llama 8B · 69.4Meta Llama 3 8B Instruct · 61.1More knowledge benchmarks
Same category · related evaluations