Which model leads on BBH?

DeepSeek V3 from DeepSeek leads BBH with a score of 83.3. The median score across 24 tested models is 45.4.

No · the top score is 83.3 out of 100 (83%). There is still meaningful room for improvement on BBH.

Does BBH predict performance on other benchmarks?

Yes · BBH scores correlate 0.98 with CMMLU across 5 shared models. Models that do well on BBH tend to do well on CMMLU.

BenchGecko pulls updates daily. New model scores on BBH appear as soon as they are published by Epoch AI or the model provider.

Benchmark · ReasoningCompetitive

Name: BBH Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

BIG-Bench Hard · a curated subset of 23 challenging tasks from BIG-Bench where language models previously failed to outperform average humans.

Updated 2024-12-26

Models tested

Top score

83.3

DeepSeek V3

Median

45.4

min 10.0

Top-5 spread

σ 3.5

Competitive

Best score over time · one chart, every benchmark

Chart type

Frontier on BBH rose from 77.2 to 83.3 in 5 months · +6.1 points · latest leader DeepSeek V3 from DeepSeek.

Pink dots = frontier records · 2 totalClick to open model page

24 models tested · sorted by score

#	Model	Score	Price
1	DeepSeek V3· DeepSeek	83.3	$0.32
2	Gemini 1.5 Pro (Feb 2024)· Google DeepMind	78.7	—
3	Llama 3.1 405B· Meta	77.2	—
4	phi-3-medium 14B· Microsoft	75.2	—
5	Qwen2.5 72B Instruct· Alibaba Qwen	73.1	$0.36
6	phi-3-small 7.4B· Microsoft	72.1	—
7	DeepSeek-V2 (MoE-236B, May 2024)· DeepSeek	71.7	—
8	GPT-4 Turbo· OpenAI	66.8	$10.00
9	phi-3-mini 3.8B· Microsoft	62.3	—
10	U Stable Beluga 2· Unknown	59.1	—
11	GPT-3.5 Turbo (older v0613)· OpenAI	48.8	$1.00
12	Phi 2· Microsoft	45.9	—
13	U Nemotron-4 15B· Unknown	44.9	—
14	Llama 2-13B· Meta	44.3	—
15	Mistral 7B V0.1· Mistral AI	41.5	—
16	Qwen-14B· Alibaba Qwen	40.0	—
17	U Yi 6B· Unknown	29.6	—
18	U Baichuan 2-7B· Unknown	22.1	—
19	U MPT-30B· Unknown	17.3	—
20	LLaMA-13B· Meta	17.2	—
21	Falcon-180B· TII	16.1	—
22	Gemma 2B· Google DeepMind	13.6	—
23	U INTELLECT-1· Unknown	13.1	—
24	U Baichuan1-7B· Unknown	10.0	—