Benchmark · KnowledgeSettled

HLE (with tools)

Updated 2026-04-07
Models tested
5
Top score
64.7
Claude Mythos Preview
Median
64.7
min 64.7
Top-5 spread
σ 0.0
Settled
HLE (WITH TOOLS) \u00B7 TOP 50255075100#1Claude Mythos Preview64.7#2Claude Opus 4.7VERIFIED64.7#3GPT-5.4VERIFIED58.7#4Claude Opus 4.6VERIFIED53.3#5Gemini 3.1 ProVERIFIED51.4benchgecko.ai/benchmark/hle-tools

Same category · related evaluations