VideoMME
VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.
The Frontier
Best score over time · one chart, every benchmark
Full rankings
8 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 66.7 | |
| 2 | 64.7 | |
| 3 | 62.5 | |
| 4 | 62.5 | |
| 5 | 60.4 | |
| 6 | 53.1 | |
| 7 | 53.1 | |
| 8 | 46.7 |
Score distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with VideoMME
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Frequently asked
About VideoMME
What does VideoMME measure?
VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension. 8 AI models have been tested on it. Scores range from 46.7 to 66.7 out of 100.
Which model leads on VideoMME?
Gemini 1.5 Pro (Feb 2024) from Google DeepMind leads VideoMME with a score of 66.7. The median score across 8 tested models is 61.5.
Is VideoMME saturated?
No · the top score is 66.7 out of 100 (67%). There is still meaningful room for improvement on VideoMME.
Does VideoMME predict performance on other benchmarks?
Yes · VideoMME scores correlate 0.80 with HELM · Omni-MATH across 5 shared models. Models that do well on VideoMME tend to do well on HELM · Omni-MATH.
How often is VideoMME data refreshed?
BenchGecko pulls updates daily. New model scores on VideoMME appear as soon as they are published by Epoch AI or the model provider.
- Category
- Multimodal
- Max score
- 100
- Models
- 8
- Updated
- 2024-11-20
Top on VideoMME
Gemini 1.5 Pro (Feb 2024) · 66.7Qwen2.5 72B Instruct · 64.7GPT-4o (2024-08-06) · 62.5GPT-4o (2024-11-20) · 62.5Gemini 1.5 Flash (May 2024) · 60.4More multimodal benchmarks
Same category · related evaluations