#	Model	Score	Price
1	DeepSeek R1 Distill Qwen 14B· DeepSeek	56.8	—
2	Qwen2 VL 7B Instruct· Alibaba	53.0	—
3	Qwen2 7B Instruct· Alibaba	51.7	—
4	Meta Llama 3 8B Instruct· Meta	49.6	—
5	Meta Llama 3 8B· Meta	48.9	—
6	DeepSeek R1 Distill Llama 8B· DeepSeek	41.4	—
7	Gemma 2 2b It· Google DeepMind	40.5	—
8	DeepSeek R1 Distill Qwen 7B· DeepSeek	39.3	—
9	Llama 2 7b Chat Hf· Meta	38.3	—
10	Llama 2 7b Hf· Meta	37.2	—
11	HF SmolLM2 135M Instruct· Hugging Face TB	15.6	—

Frequently asked

Pulled from the LLM-JP · Overall dataset · updated daily

What does LLM-JP · Overall measure?

LLM-JP · Overall is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 15.6 to 56.8 out of 100.

Which model leads on LLM-JP · Overall?

DeepSeek R1 Distill Qwen 14B from DeepSeek leads LLM-JP · Overall with a score of 56.8. The median score across 11 tested models is 41.4.

Is LLM-JP · Overall saturated?

No · the top score is 56.8 out of 100 (57%). There is still meaningful room for improvement on LLM-JP · Overall.

Does LLM-JP · Overall predict performance on other benchmarks?

Yes · LLM-JP · Overall scores correlate 0.90 with JCommonsenseQA across 11 shared models. Models that do well on LLM-JP · Overall tend to do well on JCommonsenseQA.

How often is LLM-JP · Overall data refreshed?

BenchGecko pulls updates daily. New model scores on LLM-JP · Overall appear as soon as they are published by Epoch AI or the model provider.