Benchmark · MultimodalWide open

ScienceQA

ScienceQA · multimodal science questions spanning natural science, social science, and language science with diverse question formats and image context.

Updated 2024-11-20
Models tested
5
Top score
84.7
GPT-4o (2024-05-13)
Median
62.7
min 24.4
Top-5 spread
σ 23.9
wide open

Best score over time · one chart, every benchmark

SCIENCEQA1 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Nov 24Nov 24Nov 24Nov 24Nov 24RELEASE DATE →benchgecko.ai/benchmark/scienceqa · frontier
Only 1 models have been tested on ScienceQA · not enough history to compute a frontier yet.
Pink dots = frontier records · 0 totalClick to open model page

5 models tested · sorted by score

Details
Category
Multimodal
Max score
100
Models
5
Updated
2024-11-20

Same category · related evaluations