Which model leads on DeepResearch Bench?

GPT-5 from OpenAI leads DeepResearch Bench with a score of 55.1. The median score across 13 tested models is 47.8.

Is DeepResearch Bench saturated?

No · the top score is 55.1 out of 100 (55%). There is still meaningful room for improvement on DeepResearch Bench.

Does DeepResearch Bench predict performance on other benchmarks?

Yes · DeepResearch Bench scores correlate 0.96 with Cybench across 6 shared models. Models that do well on DeepResearch Bench tend to do well on Cybench.

How often is DeepResearch Bench data refreshed?

BenchGecko pulls updates daily. New model scores on DeepResearch Bench appear as soon as they are published by Epoch AI or the model provider.

Benchmark · KnowledgeSettled

DeepResearch Bench

Name: DeepResearch Bench Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

DeepResearch Bench · evaluates AI on complex multi-step research tasks requiring information gathering, synthesis, and producing comprehensive analyses.

Updated 2025-09-29

Models tested

Top score

55.1

GPT-5

Median

47.8

min 29.2

Top-5 spread

σ 2.3

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on DeepResearch Bench rose from 35.1 to 55.1 in 7 months · +20.0 points · latest leader GPT-5 from OpenAI.

Pink dots = frontier records · 6 totalClick to open model page

Full rankings

13 models tested · sorted by score

#	Model	Score	Price
1	GPT-5· OpenAI	55.1	$1.25
2	Claude Sonnet 4.5· Anthropic	52.6	$3.00
3	Gemini 2.5 Pro· Google DeepMind	49.7	$1.25
4	Claude Opus 4.1· Anthropic	49.7	$15.00
5	Claude Opus 4· Anthropic	49.0	$15.00
6	Grok 4· xAI	47.9	$3.00
7	Claude Sonnet 4· Anthropic	47.8	$3.00
8	o3· OpenAI	46.6	$2.00
9	Claude 3.7 Sonnet· Anthropic	43.6	$3.00
10	R1· DeepSeek	35.1	$0.70
11	R1 0528· DeepSeek	35.1	$0.50
12	GPT-4.1· OpenAI	29.3	$2.00
13	Gemini 2.5 Flash· Google DeepMind	29.2	$0.30