Which model leads on VideoMME?

Gemini 1.5 Pro (Feb 2024) from Google DeepMind leads VideoMME with a score of 66.7. The median score across 8 tested models is 61.5.

Is VideoMME saturated?

No · the top score is 66.7 out of 100 (67%). There is still meaningful room for improvement on VideoMME.

Does VideoMME predict performance on other benchmarks?

Yes · VideoMME scores correlate 0.80 with HELM · Omni-MATH across 5 shared models. Models that do well on VideoMME tend to do well on HELM · Omni-MATH.

How often is VideoMME data refreshed?

BenchGecko pulls updates daily. New model scores on VideoMME appear as soon as they are published by Epoch AI or the model provider.

Benchmark · MultimodalSettled

VideoMME

Name: VideoMME Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.

Updated 2024-11-20

Models tested

Top score

66.7

Gemini 1.5 Pro (Feb 2024)

Median

61.5

min 46.7

Top-5 spread

σ 2.1

Competitive

The Frontier

Best score over time · one chart, every benchmark

Chart type

Frontier on VideoMME rose from 53.1 to 64.7 in 2 months · +11.6 points · latest leader Qwen2.5 72B Instruct from Alibaba Qwen.

Pink dots = frontier records · 3 totalClick to open model page

Full rankings

8 models tested · sorted by score

#	Model	Score	Price
1	Gemini 1.5 Pro (Feb 2024)· Google DeepMind	66.7	—
2	Qwen2.5 72B Instruct· Alibaba Qwen	64.7	$0.36
3	GPT-4o (2024-08-06)· OpenAI	62.5	$2.50
4	GPT-4o (2024-11-20)· OpenAI	62.5	$2.50
5	Gemini 1.5 Flash (May 2024)· Google DeepMind	60.4	—
6	GPT-4o-mini· OpenAI	53.1	$0.15
7	GPT-4o-mini (2024-07-18)· OpenAI	53.1	$0.15
8	Claude 3.5 Sonnet· Anthropic	46.7	—