Benchmark · KnowledgeCompetitive

PIQA

PIQA (Physical Interaction QA) · tests intuitive physical reasoning by asking models to select the correct approach for everyday physical tasks.

Updated 2024-12-26
Models tested
25
Top score
77.4
GPT-4o-mini
Median
65.2
min 47.0
Top-5 spread
σ 3.0
Competitive

Best score over time · one chart, every benchmark

PIQA8 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jun 24Aug 24Sep 24Nov 24Dec 24RELEASE DATE →benchgecko.ai/benchmark/piqa · frontier
Frontier on PIQA rose from 67.4 to 77.4 in 1 months · +10.0 points · latest leader GPT-4o-mini from OpenAI.
Pink dots = frontier records · 3 totalClick to open model page

Same category · related evaluations