Which model leads on CadEval?

o3 from OpenAI leads CadEval with a score of 74.0. The median score across 15 tested models is 42.0.

Is CadEval saturated?

No · the top score is 74.0 out of 100 (74%). There is still meaningful room for improvement on CadEval.

Yes · CadEval scores correlate 0.95 with Aider polyglot across 13 shared models. Models that do well on CadEval tend to do well on Aider polyglot.

BenchGecko pulls updates daily. New model scores on CadEval appear as soon as they are published by Epoch AI or the model provider.

Benchmark · CodeCompetitive

Name: CadEval Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

CadEval · evaluates the ability to generate and reason about Computer-Aided Design code, testing spatial reasoning and engineering knowledge.

Updated 2025-06-17

Models tested

Top score

74.0

Median

42.0

min 12.0

Top-5 spread

σ 7.0

wide open

Best score over time · one chart, every benchmark

Chart type

Frontier on CadEval rose from 26.0 to 74.0 in 8 months · +48.0 points · latest leader o3 from OpenAI.

Pink dots = frontier records · 4 totalClick to open model page

15 models tested · sorted by score

#	Model	Score	Price
1	o3· OpenAI	74.0	$2.00
2	Gemini 2.5 Pro· Google DeepMind	64.0	$1.25
3	o4 Mini· OpenAI	62.0	$1.10
4	o1· OpenAI	56.0	$15.00
5	Claude 3.7 Sonnet· Anthropic	54.0	$3.00
6	o3 Mini· OpenAI	54.0	$1.10
7	Claude 3.5 Sonnet· Anthropic	48.0	—
8	GPT-4.1· OpenAI	42.0	$2.00
9	Gemini 1.5 Pro (Feb 2024)· Google DeepMind	34.0	—
10	Claude 3.5 Haiku· Anthropic	32.0	$0.80
11	Gemini 2.0 Flash· Google DeepMind	30.0	$0.10
12	GPT-4o (2024-08-06)· OpenAI	26.0	$2.50
13	GPT-4o (2024-11-20)· OpenAI	26.0	$2.50
14	GPT-4.1 Mini· OpenAI	16.0	$0.40
15	Claude 3 Haiku· Anthropic	12.0	$0.25