Benchmark · Knowledge

SWE-bench Multilingual

Updated 2026-04-07
Models tested
1
Top score
87.3
Claude Mythos Preview
Median
87.3
min 87.3
Top-5 spread
σ 0.0
Settled

1 models tested · sorted by score

#ModelScore
1Anthropic logoClaude Mythos Preview87.3

Same category · related evaluations