Benchmark · KnowledgeSettled

JSQuAD

Updated 2025-01-20
Models tested
11
Top score
89.9
Qwen2 VL 7B Instruct
Median
83.8
min 13.9
Top-5 spread
σ 0.3
Settled

Best score over time · one chart, every benchmark

JSQUAD7 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jun 24Jul 24Sep 24Nov 24Jan 25RELEASE DATE →benchgecko.ai/benchmark/jp-jsquad · frontier
Frontier on JSQuAD rose from 89.6 to 89.9 in 3 months · +0.3 points · latest leader Qwen2 VL 7B Instruct from Alibaba.
Pink dots = frontier records · 2 totalClick to open model page

Same category · related evaluations