测试版
基准测试 · Knowledge已尘埃落定

ARC AI2

AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.

已更新 2025-04-15
已测试模型
35
最高分
93.7
DeepSeek V3
中位数
47.9
最低 0.5
前 5 名差距
σ 2.1
竞争中

Best score over time · one chart, every benchmark

ARC AI26 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Sep 24Nov 24Feb 25Apr 25RELEASE DATE →benchgecko.ai/benchmark/arc-ai2 · frontier
Only 6 models have been tested on ARC AI2 · not enough history to compute a frontier yet.
Pink dots = frontier records · 1 totalClick to open model page

同类别 · 相关评测