Name: SWE-bench Multimodal Benchmark
Creator: BenchGecko
License: https://creativecommons.org/licenses/by/4.0/

Question 1

What does SWE-bench Multimodal measure?

Accepted Answer

SWE-bench Multimodal is a knowledge benchmark in the BenchGecko catalog. 1 AI models have been tested on it. Scores range from 59.0 to 59.0 out of 100.

Question 2

Which model leads on SWE-bench Multimodal?

Accepted Answer

Claude Mythos Preview from Anthropic leads SWE-bench Multimodal with a score of 59.0. The median score across 1 tested models is 59.0.

Question 3

Is SWE-bench Multimodal saturated?

Accepted Answer

No · the top score is 59.0 out of 100 (59%). There is still meaningful room for improvement on SWE-bench Multimodal.

Question 4

What makes SWE-bench Multimodal distinctive?

Accepted Answer

SWE-bench Multimodal is a knowledge benchmark with limited overlap to the rest of the catalog · it measures capabilities that are not well-covered by other benchmarks we track.

Question 5

How often is SWE-bench Multimodal data refreshed?

Accepted Answer

BenchGecko pulls updates daily. New model scores on SWE-bench Multimodal appear as soon as they are published by Epoch AI or the model provider.

SWE-bench Multimodal

Full rankings

Score distribution

Correlated benchmarks

Frequently asked

Top on SWE-bench Multimodal

Related topics

Compare models

More knowledge benchmarks