ARC AI2
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval.
The Frontier
Best score over time · one chart, every benchmark
Classifica completa
35 modelli testati · ordinati per punteggio
| # | Modello | Punteggio |
|---|---|---|
| 1 | 93.7 | |
| 2 | 93.7 | |
| 3 | 92.7 | |
| 4 | 89.6 | |
| 5 | 88.8 | |
| 6 | 87.6 | |
| 7 | 83.2 | |
| 8 | 83.1 | |
| 9 | 81.7 | |
| 10 | U Stable Beluga 2 | 81.5 |
| 11 | 79.9 | |
| 12 | 79.2 | |
| 13 | 77.1 | |
| 14 | 71.5 | |
| 15 | 67.9 | |
| 16 | 60.7 | |
| 17 | 57.1 | |
| 18 | 47.9 | |
| 19 | 47.1 | |
| 20 | U Nemotron-4 15B | 40.7 |
| 21 | U INTELLECT-1 | 39.4 |
| 22 | 36.9 | |
| 23 | U MPT-30B | 34.1 |
| 24 | U Yi 6B | 33.7 |
| 25 | U StarCoder 2 15B | 29.6 |
| 26 | 26.9 | |
| 27 | 25.9 | |
| 28 | 22.9 | |
| 29 | 22.8 | |
| 30 | U XGen-7B | 21.6 |
| 31 | U Dolly 2.0-12b | 19.5 |
| 32 | 15.2 | |
| 33 | U Baichuan 2-7B | 10.0 |
| 34 | 9.9 | |
| 35 | 0.5 |
Distribuzione dei punteggi
Dove si concentrano i modelli
Benchmark correlati
Pearson r · ricerca originale
Benchmarks that track with ARC AI2
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Domande frequenti
About ARC AI2
What does ARC AI2 measure?
AI2 Reasoning Challenge · tests grade-school level science knowledge with multiple-choice questions requiring reasoning beyond simple retrieval. 35 AI models have been tested on it. Scores range from 0.5 to 93.7 out of 100.
Which model leads on ARC AI2?
DeepSeek V3 from DeepSeek leads ARC AI2 with a score of 93.7. The median score across 35 tested models is 47.9.
Is ARC AI2 saturated?
No · the top score is 93.7 out of 100 (94%). There is still meaningful room for improvement on ARC AI2.
Does ARC AI2 predict performance on other benchmarks?
Yes · ARC AI2 scores correlate 0.90 with Chatbot Arena Elo · Overall across 5 shared models. Models that do well on ARC AI2 tend to do well on Chatbot Arena Elo · Overall.
How often is ARC AI2 data refreshed?
BenchGecko pulls updates daily. New model scores on ARC AI2 appear as soon as they are published by Epoch AI or the model provider.
- Categoria
- Knowledge
- Punteggio massimo
- 100
- Modelli
- 35
- Aggiornato
- 2025-04-15
Altri benchmark knowledge
Stessa categoria · valutazioni correlate