#	Model	Score	Price
1	DeepSeek R1 Distill Qwen 14B· DeepSeek	93.7	—
2	Qwen2 7B Instruct· Alibaba	89.1	—
3	Qwen2 VL 7B Instruct· Alibaba	87.8	—
4	Meta Llama 3 8B Instruct· Meta	87.7	—
5	Meta Llama 3 8B· Meta	82.9	—
6	Gemma 2 2b It· Google DeepMind	78.2	—
7	DeepSeek R1 Distill Llama 8B· DeepSeek	62.4	—
8	DeepSeek R1 Distill Qwen 7B· DeepSeek	59.8	—
9	Llama 2 7b Chat Hf· Meta	52.6	—
10	Llama 2 7b Hf· Meta	25.5	—
11	HF SmolLM2 135M Instruct· Hugging Face TB	17.0	—

Frequently asked

Pulled from the JCommonsenseQA dataset · updated daily

What does JCommonsenseQA measure?

JCommonsenseQA is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 17.0 to 93.7 out of 100.

Which model leads on JCommonsenseQA?

DeepSeek R1 Distill Qwen 14B from DeepSeek leads JCommonsenseQA with a score of 93.7. The median score across 11 tested models is 78.2.

Is JCommonsenseQA saturated?

No · the top score is 93.7 out of 100 (94%). There is still meaningful room for improvement on JCommonsenseQA.

Does JCommonsenseQA predict performance on other benchmarks?

Yes · JCommonsenseQA scores correlate 0.90 with LLM-JP · Overall across 11 shared models. Models that do well on JCommonsenseQA tend to do well on LLM-JP · Overall.

How often is JCommonsenseQA data refreshed?

BenchGecko pulls updates daily. New model scores on JCommonsenseQA appear as soon as they are published by Epoch AI or the model provider.