OpenCompass · HLE
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with OpenCompass · HLE
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
32 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 28.6 | |
| 2 | 28.6 | |
| 3 | 28.1 | |
| 4 | 27.5 | |
| 5 | 25.4 | |
| 6 | 23.2 | |
| 7 | 22.2 | |
| 8 | 21.6 | |
| 9 | 21.3 | |
| 10 | 21.1 | |
| 11 | 20.5 | |
| 12 | 19.3 | |
| 13 | 18.5 | |
| 14 | 18.3 | |
| 15 | 16.9 | |
| 16 | 14.4 | |
| 17 | 13.5 | |
| 18 | 13.4 | |
| 19 | 12.3 | |
| 20 | 11.7 | |
| 21 | 11.6 | |
| 22 | 8.7 | |
| 23 | 8.5 | |
| 24 | 8.5 | |
| 25 | 8.0 | |
| 26 | 7.7 | |
| 27 | 6.5 | |
| 28 | 6.0 | |
| 29 | 6.0 | |
| 30 | 5.5 | |
| 31 | 5.1 | |
| 32 | 4.2 |
Frequently asked
Pulled from the OpenCompass · HLE dataset · updated daily
What does OpenCompass · HLE measure?
OpenCompass · HLE is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 4.2 to 28.6 out of 100.
Which model leads on OpenCompass · HLE?
DeepSeek V3.2 Speciale from DeepSeek leads OpenCompass · HLE with a score of 28.6. The median score across 32 tested models is 13.9.
Is OpenCompass · HLE saturated?
No · the top score is 28.6 out of 100 (29%). There is still meaningful room for improvement on OpenCompass · HLE.
Does OpenCompass · HLE predict performance on other benchmarks?
Yes · OpenCompass · HLE scores correlate 0.98 with Artificial Analysis · Coding Index across 11 shared models. Models that do well on OpenCompass · HLE tend to do well on Artificial Analysis · Coding Index.
How often is OpenCompass · HLE data refreshed?
BenchGecko pulls updates daily. New model scores on OpenCompass · HLE appear as soon as they are published by Epoch AI or the model provider.
Top on OpenCompass · HLE
DeepSeek V3.2 Speciale · 28.6Kimi K2.5 · 28.6GLM 5 · 28.1Qwen3.5 397B A17B · 27.5GLM 4.7 · 25.4More knowledge benchmarks
Same category · related evaluations