Best AI Models for Vision

Q: Which AI model is best for vision tasks?

Vision capabilities vary across models. The leaderboard above ranks multimodal models by MMMU, VideoMME, and other visual benchmarks.

Q: What is MMMU?

MMMU (Massive Multi-discipline Multimodal Understanding) tests models on college-level questions requiring both image understanding and domain knowledge across 30+ subjects.

AI models ranked by vision and multimodal benchmarks. Compare MMMU, VideoMME, and visual reasoning scores.

Models

Providers

Open Source

$1.75

Median $/1M in

Top 3

Full Rankings

#	Model	Avg	VideoMME	VPCT	$/1M in	Context
1	Gemini 3 Pro🇺🇸 Google DeepMind	60.5	-	86.5	N/A	0K
2	o1🇺🇸 OpenAI	56.4	-	5.5	$15.00	200K
3	Gemini 2.5 Pro🇺🇸 Google DeepMind	56.2	-	19.6	$1.25	1.0M
4	GPT-5 Mini🇺🇸 OpenAI	56.0	-	10.3	$0.25	400K
5	o3🇺🇸 OpenAI	55.2	-	28.0	$2.00	200K
6	GPT-5🇺🇸 OpenAI	54.4	-	49.0	$1.25	400K
7	GPT-5.2🇺🇸 OpenAI	54.0	-	76.0	$1.75	400K
8	o4 Mini🇺🇸 OpenAI	53.2	-	36.3	$1.10	200K
9	Qwen2.5 72B Instruct🇨🇳 Alibaba QwenOpen	53.2	64.7	-	$0.12	33K
10	GPT-5.1🇺🇸 OpenAI	49.6	-	38.0	$1.25	400K
11	Gemini 3 Flash Preview🇺🇸 Google DeepMind	49.1	-	58.9	$0.50	1.0M
12	Claude 3.7 Sonnet🇺🇸 Anthropic	47.7	-	8.5	$3.00	200K
13	Gemini 1.5 Flash (May 2024)🇺🇸 Google DeepMind	47.4	60.4	-	N/A	0K
14	Claude Opus 4.5🇺🇸 Anthropic	45.4	-	10.0	$5.00	200K
15	GPT-5 Nano🇺🇸 OpenAI	45.3	-	5.8	$0.05	400K
16	Claude Sonnet 4🇺🇸 Anthropic	44.6	-	1.0	$3.00	1.0M
17	GPT-4o-mini (2024-07-18)🇺🇸 OpenAI	43.2	53.1	1.0	$0.15	128K
18	Claude 3.5 Sonnet🇺🇸 Anthropic	42.3	46.7	-	N/A	0K
19	Claude Sonnet 4.5🇺🇸 Anthropic	42.1	-	9.7	$3.00	1.0M
20	Claude Opus 4🇺🇸 Anthropic	41.7	-	7.0	$15.00	200K
21	Claude Opus 4.1🇺🇸 Anthropic	41.3	-	2.5	$15.00	200K
22	Gemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind	41.3	66.7	-	N/A	0K
23	Gemini 2.5 Flash🇺🇸 Google DeepMind	40.0	-	7.0	$0.30	1.0M
24	GPT-4o-mini🇺🇸 OpenAI	39.6	53.1	1.0	$0.15	128K
25	GPT-4o (2024-11-20)🇺🇸 OpenAI	37.7	62.5	10.0	$2.50	128K
26	GPT-4.5🇺🇸 OpenAI	35.9	-	17.5	N/A	0K
27	GPT-4o (2024-08-06)🇺🇸 OpenAI	35.6	62.5	-	$2.50	128K

90+ Gold 80-89 70-79 60-69 <60Scores in % unless noted. Avg = unweighted mean across tested benchmarks.

About this category

Models ranked by visual understanding across MMMU, VideoMME, and other multimodal benchmarks. These tests measure image comprehension, visual reasoning, and video understanding.

Related categories

Best AI Models for Knowledge

AI models ranked by knowledge benchmarks. Compare MMLU-Pro, GPQA Diamond, SimpleQA, and other knowledge tests.

Best AI Models for Reasoning

AI models ranked by reasoning benchmarks. Compare GPQA Diamond, ARC-AGI, BBH, and other reasoning tests across all providers.

Flagship AI Models

The best AI model from each provider, ranked by benchmark score. Compare the flagships from OpenAI, Anthropic, Google, Meta, and more.

Frequently asked questions

Which AI model is best for vision tasks?

Vision capabilities vary across models. The leaderboard above ranks multimodal models by MMMU, VideoMME, and other visual benchmarks.

What is MMMU?

MMMU (Massive Multi-discipline Multimodal Understanding) tests models on college-level questions requiring both image understanding and domain knowledge across 30+ subjects.

Back to all models