Benchmark · MultimodalSettled

VideoMME

VideoMME · multimodal benchmark testing video understanding across diverse domains, requiring temporal reasoning and cross-frame comprehension.

Updated 2024-11-20
Models tested
8
Top score
66.7
Gemini 1.5 Pro (Feb 2024)
Median
61.5
min 46.7
Top-5 spread
σ 2.1
Competitive

Best score over time · one chart, every benchmark

VIDEOMME5 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Aug 24Sep 24Oct 24Nov 24RELEASE DATE →benchgecko.ai/benchmark/videomme · frontier
Frontier on VideoMME rose from 53.1 to 64.7 in 2 months · +11.6 points · latest leader Qwen2.5 72B Instruct from Alibaba Qwen.
Pink dots = frontier records · 3 totalClick to open model page
Details
Category
Multimodal
Max score
100
Models
8
Updated
2024-11-20

Same category · related evaluations