Multimodal rankingData updated · May 6, 202612 ranked models

Best Multimodal AI Models

A scoped ranking for models with visual and video benchmark evidence. It avoids claiming broad visual ability when only one benchmark is available.

Rank 1Limited confidence

Alibaba Qwen

Composite score
Rank 2Limited confidence

OpenAI

Composite score
Rank 3Limited confidence

OpenAI

Composite score

Scores are based on the visible benchmark set and available metadata.

Missing prices stay missing
RankModelScoreEvidenceInput priceContext
#1Qwen2.5 72B Instruct
Alibaba Qwen
87.51 benchmarks · Limited$0.36/M33K
#2GPT-5.2
OpenAI
87.51 benchmarks · Limited$1.75/M400K
#3GPT-4o (2024-05-13)
OpenAI
87.51 benchmarks · Limited$5.00/M128K
#4GPT-4o (2024-08-06)
OpenAI
71.41 benchmarks · Limited$2.50/M128K
#5GPT-4o (2024-11-20)
OpenAI
59.73 benchmarks · Medium$2.50/M128K
#6GPT-5
OpenAI
561 benchmarks · Limited$1.25/M400K
#7GPT-5.1
OpenAI
43.21 benchmarks · Limited$1.25/M400K
#8o4 Mini
OpenAI
41.11 benchmarks · Limited$1.10/M200K
#9o3
OpenAI
31.51 benchmarks · Limited$2.00/M200K
#10Gemini 2.5 Pro
Google DeepMind
21.71 benchmarks · Limited$1.25/M1.0M
#11GPT-5 Mini
OpenAI
10.91 benchmarks · Limited$0.25/M400K
#12Claude Opus 4.5
Anthropic
10.51 benchmarks · Limited$5.00/M200K
Strict caveat

Multimodal coverage is uneven. Some models have strong video scores but fewer chart, image, or document results.

BenchGecko ranks models from published benchmark scores and model metadata. Scores do not measure every use case, and missing data can affect rankings.

Related ranking

Reasoning models ranked from public benchmark scores across GPQA Diamond, BBH, ARC-AGI, SimpleBench, and related tests.

Related ranking

Math models ranked from public benchmark scores across GSM8K, MATH-level tests, AIME-style tasks, and FrontierMath where available.

Related ranking

Open-weight AI models ranked from available benchmark data, coverage confidence, pricing metadata, and listed license signals.

What counts as multimodal evidence?

BenchGecko uses published benchmarks for video understanding, image reasoning, charts, and visual question answering where those scores exist.

Why are some famous models missing?

A model may be missing if BenchGecko does not have a qualifying public score for the benchmarks used on this page.

Is this the same as an image generation ranking?

No. This page focuses on understanding and reasoning with visual inputs, not image generation quality.