#	Model	Score	Price
1	DeepSeek V3· DeepSeek	93.7	$0.32
2	Llama 3.1 405B· Meta	93.7	—
3	Qwen2.5 72B Instruct· Alibaba Qwen	92.7	$0.12
4	DeepSeek-V2 (MoE-236B, May 2024)· DeepSeek	89.6	—
5	phi-3-medium 14B· Microsoft	88.8	—
6	phi-3-small 7.4B· Microsoft	87.6	—
7	GPT-3.5 Turbo (older v0613)· OpenAI	83.2	$1.00
8	Mixtral 8x7B Instruct· Mistral AI	83.1	$0.54
9	Claude Instant· Anthropic	81.7	—
10	U Stable Beluga 2· Unknown	81.5	—
11	phi-3-mini 3.8B· Microsoft	79.9	—
12	Qwen-14B· Alibaba Qwen	79.2	—
13	Llama 3 8B Instruct· Meta	77.1	$0.03
14	Mistral 7B V0.1· Mistral AI	71.5	—
15	Phi 2· Microsoft	67.9	—
16	Qwen2.5 Coder 32B Instruct· Alibaba Qwen	60.7	$0.66
17	Falcon-180B· TII	57.1	—
18	Qwen2.5 Coder 7B Instruct· Alibaba Qwen	47.9	$0.03
19	Llama 2-13B· Meta	47.1	—
20	U Nemotron-4 15B· Unknown	40.7	—
21	U INTELLECT-1· Unknown	39.4	—
22	LLaMA-13B· Meta	36.9	—
23	U MPT-30B· Unknown	34.1	—
24	U Yi 6B· Unknown	33.7	—
25	U StarCoder 2 15B· Unknown	29.6	—
26	Qwen2.5 Coder 1.5B Instruct· Alibaba	26.9	—
27	Phi-1.5· Microsoft	25.9	—
28	DeepSeek Coder 33B· DeepSeek	22.9	—
29	Gemma 2B· Google DeepMind	22.8	—
30	U XGen-7B· Unknown	21.6	—
31	U Dolly 2.0-12b· Unknown	19.5	—
32	DeepSeek Coder 6.7B· DeepSeek	15.2	—
33	U Baichuan 2-7B· Unknown	10.0	—
34	Cerebras-GPT-13B· OpenAI	9.9	—
35	DeepSeek Coder 1.3B· DeepSeek	0.5	—

Frequently asked

Pulled from the ARC AI2 dataset · updated daily

What does ARC AI2 measure?

ARC AI2 is a knowledge benchmark in the BenchGecko catalog. 35 AI models have been tested on it. Scores range from 0.5 to 93.7 out of 100.

Which model leads on ARC AI2?

DeepSeek V3 from DeepSeek leads ARC AI2 with a score of 93.7. The median score across 35 tested models is 47.9.

Is ARC AI2 saturated?

No · the top score is 93.7 out of 100 (94%). There is still meaningful room for improvement on ARC AI2.

Does ARC AI2 predict performance on other benchmarks?

Yes · ARC AI2 scores correlate 0.90 with Chatbot Arena Elo · Overall across 5 shared models. Models that do well on ARC AI2 tend to do well on Chatbot Arena Elo · Overall.

How often is ARC AI2 data refreshed?

BenchGecko pulls updates daily. New model scores on ARC AI2 appear as soon as they are published by Epoch AI or the model provider.