Beta
Benchmark · Knowledge

JCommonsenseQA

Updated 2025-01-20
Models tested
11
Top score
93.7
DeepSeek R1 Distill Qwen 14B
Median
78.2
min 17.0
Top-5 spread
σ 3.5
competitive

Best score over time · one chart, every benchmark

JCOMMONSENSEQA9 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Apr 24Jun 24Sep 24Nov 24Jan 25RELEASE DATE →benchgecko.ai/benchmark/jp-jcommonsenseqa · frontier
Frontier on JCommonsenseQA rose from 82.9 to 93.7 in 9 months · +10.8 points · latest leader DeepSeek R1 Distill Qwen 14B from DeepSeek.
Pink dots = frontier records · 4 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–10110–20120–3030–4040–50250–60160–70170–80480–90190–100MEDIAN · 78.2SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with JCommonsenseQA

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

11 models tested · sorted by score

Pulled from the JCommonsenseQA dataset · updated daily

What does JCommonsenseQA measure?

JCommonsenseQA is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 17.0 to 93.7 out of 100.

Which model leads on JCommonsenseQA?

DeepSeek R1 Distill Qwen 14B from DeepSeek leads JCommonsenseQA with a score of 93.7. The median score across 11 tested models is 78.2.

Is JCommonsenseQA saturated?

No · the top score is 93.7 out of 100 (94%). There is still meaningful room for improvement on JCommonsenseQA.

Does JCommonsenseQA predict performance on other benchmarks?

Yes · JCommonsenseQA scores correlate 0.90 with LLM-JP · Overall across 11 shared models. Models that do well on JCommonsenseQA tend to do well on LLM-JP · Overall.

How often is JCommonsenseQA data refreshed?

BenchGecko pulls updates daily. New model scores on JCommonsenseQA appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations