LIVETracking 971 AI models from 268 providers.

Models971·Providers268·Benchmarks128·Companies71·Agents165·TopQwen3 VL 235B A22B Instruct · 1415.8%·Updated2h·Data Points2,902·MCP Servers4,923

Benchmark · Knowledge

JSQuAD

Updated 2025-01-20

Models tested

11

Top score

89.9

Qwen2 VL 7B Instruct

Median

83.8

min 13.9

Top-5 spread

σ 0.3

settled

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on JSQuAD rose from 88.9 to 89.9 in 4 months · +1.0 points · latest leader Qwen2 VL 7B Instruct from Alibaba.

Pink dots = frontier records · 4 totalClick to open model page

Distribution

Where models cluster

Correlated benchmarks

Pearson r · original research

Correlation analysis

Benchmarks that track with JSQuAD

Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.

LLM-JP · OverallKnowledge

JCommonsenseQAKnowledge

MMLU-PROKnowledge

BBH (HuggingFace)Knowledge

Full rankings

11 models tested · sorted by score

#	Model	Score	Price	Bar
1	Qwen2 VL 7B Instruct· Alibaba	89.9	—
2	DeepSeek R1 Distill Qwen 14B· DeepSeek	89.8	—
3	Qwen2 7B Instruct· Alibaba	89.6	—
4	Meta Llama 3 8B Instruct· Meta	89.5	—
5	Meta Llama 3 8B· Meta	88.9	—
6	Gemma 2 2b It· Google DeepMind	83.8	—
7	Llama 2 7b Chat Hf· Meta	83.0	—
8	DeepSeek R1 Distill Llama 8B· DeepSeek	80.2	—
9	Llama 2 7b Hf· Meta	79.9	—
10	DeepSeek R1 Distill Qwen 7B· DeepSeek	74.2	—
11	HF SmolLM2 135M Instruct· Hugging Face TB	13.9	—

Frequently asked

Pulled from the JSQuAD dataset · updated daily

What does JSQuAD measure?

JSQuAD is a knowledge benchmark in the BenchGecko catalog. 11 AI models have been tested on it. Scores range from 13.9 to 89.9 out of 100.

Which model leads on JSQuAD?

Qwen2 VL 7B Instruct from Alibaba leads JSQuAD with a score of 89.9. The median score across 11 tested models is 83.8.

Is JSQuAD saturated?

No · the top score is 89.9 out of 100 (90%). There is still meaningful room for improvement on JSQuAD.

Does JSQuAD predict performance on other benchmarks?

Yes · JSQuAD scores correlate 0.90 with LLM-JP · Overall across 11 shared models. Models that do well on JSQuAD tend to do well on LLM-JP · Overall.

How often is JSQuAD data refreshed?

BenchGecko pulls updates daily. New model scores on JSQuAD appear as soon as they are published by Epoch AI or the model provider.

Top on JSQuAD

Qwen2 VL 7B Instruct · 89.9 DeepSeek R1 Distill Qwen 14B · 89.8 Qwen2 7B Instruct · 89.6 Meta Llama 3 8B Instruct · 89.5 Meta Llama 3 8B · 88.9

Related topics

Knowledge category All benchmarks Model leaderboard Methodology

Compare models

Qwen2 VL 7B Instruct vs DeepSeek R1 Distill Qwen 14B DeepSeek R1 Distill Qwen 14B vs Qwen2 7B Instruct Qwen2 7B Instruct vs Meta Llama 3 8B Instruct Meta Llama 3 8B Instruct vs Meta Llama 3 8B

More knowledge benchmarks

Same category · related evaluations

Chatbot Arena Elo · Overall

BBH (HuggingFace)

Artificial Analysis · Quality Index