Benchmark · Knowledge

SWE-bench Multimodal

Updated 2026-04-07
Models tested
1
Top score
59.0
Claude Mythos Preview
Median
59.0
min 59.0
Top-5 spread
σ 0.0
Settled

1 models tested · sorted by score

#ModelScore
1Anthropic logoClaude Mythos Preview59.0

Same category · related evaluations