Beta
Benchmark · Knowledge

JSQuAD

Updated 2025-01-20
Models tested
11
Top score
89.9
Qwen2 VL 7B Instruct
Median
83.8
min 13.9
Top-5 spread
σ 0.3
settled

Best score over time · one chart, every benchmark

JSQUAD9 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Apr 24Jun 24Sep 24Nov 24Jan 25RELEASE DATE →benchgecko.ai/benchmark/jp-jsquad · frontier
Frontier on JSQuAD rose from 88.9 to 89.9 in 4 months · +1.0 points · latest leader Qwen2 VL 7B Instruct from Alibaba.
Pink dots = frontier records · 4 totalClick to open model page

Where models cluster

SCORE DISTRIBUTION0–10110–2020–3030–4040–5050–6060–70270–80880–9090–100MEDIAN · 83.8SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Correlation analysis

Benchmarks that track with JSQuAD

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

11 models tested · sorted by score

Pulled from the JSQuAD dataset · updated daily

What does JSQuAD measure?

JSQuAD is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 13.9 to 89.9 out of 100.

Which model leads on JSQuAD?

Qwen2 VL 7B Instruct from Alibaba leads JSQuAD with a score of 89.9. The median score across 11 tested models is 83.8.

Is JSQuAD saturated?

No · the top score is 89.9 out of 100 (90%). There is still meaningful room for improvement on JSQuAD.

Does JSQuAD predict performance on other benchmarks?

Yes · JSQuAD scores correlate 0.90 with LLM-JP · Overall across 11 shared models. Models that do well on JSQuAD tend to do well on LLM-JP · Overall.

How often is JSQuAD data refreshed?

BenchGecko pulls updates daily. New model scores on JSQuAD appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations