Which model leads on LAMBADA?

Falcon-180B from TII leads LAMBADA with a score of 79.8. The median score across 7 tested models is 73.3.

Is LAMBADA saturated?

No · the top score is 79.8 out of 100 (80%). There is still meaningful room for improvement on LAMBADA.

Does LAMBADA predict performance on other benchmarks?

Yes · LAMBADA scores correlate 0.51 with HellaSwag across 6 shared models. Models that do well on LAMBADA tend to do well on HellaSwag.

How often is LAMBADA data refreshed?

BenchGecko pulls updates daily. New model scores on LAMBADA appear as soon as they are published by Epoch AI or the model provider.

Benchmark · KnowledgeSettled

LAMBADA

Name: LAMBADA Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

LAMBADA · measures the ability to predict the final word of a passage, requiring broad contextual understanding across long text spans.

Updated 2026-06-22

Models tested

Top score

79.8

Falcon-180B

Median

73.3

min 70.0

Top-5 spread

σ 2.9

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Only 0 models have been tested on LAMBADA · not enough history to compute a frontier yet.

Pink dots = frontier records · 0 totalClick to open model page

Full rankings

7 models tested · sorted by score

#	Model	Score	Price
1	Falcon-180B· TII	79.8	—
2	Llama 2-13B· Meta	76.5	—
3	LLaMA-13B· Meta	75.2	—
4	U Baichuan 2-7B· Unknown	73.3	—
5	U Stable Beluga 2· Unknown	71.3	—
6	Qwen-14B· Alibaba Qwen	71.1	—
7	U MPT-30B· Unknown	70.0	—

Details

Category: Knowledge
Max score: 100
Models: 7
Updated: 2026-06-22

Links

Learn more

Related benchmarks

Chatbot Arena Elo · Overall113 models BBH (HuggingFace)73 models IFEval73 models MMLU-PRO73 models MUSR73 models MATH Level 570 models

Top on LAMBADA

Falcon-180B · 79.8 Llama 2-13B · 76.5 LLaMA-13B · 75.2 Baichuan 2-7B · 73.3 Stable Beluga 2 · 71.3

Compare models

Falcon-180B vs Llama 2-13B Llama 2-13B vs LLaMA-13B LLaMA-13B vs Baichuan 2-7B Baichuan 2-7B vs Stable Beluga 2

More knowledge benchmarks

Same category · related evaluations

Chatbot Arena Elo · Overall

Artificial Analysis · Quality Index

68 models

GPQA

67 models

LAMBADA

The Frontier

Full rankings

Score distribution

Correlated benchmarks

Benchmarks that track with LAMBADA

Frequently asked

Top on LAMBADA

Related topics

Compare models

More knowledge benchmarks