OpenCompass · IFEval
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with OpenCompass · IFEval
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
32 models tested · sorted by score
| # | Model | Score |
|---|---|---|
| 1 | 93.9 | |
| 2 | 93.2 | |
| 3 | 93.2 | |
| 4 | 92.4 | |
| 5 | 91.7 | |
| 6 | 91.5 | |
| 7 | 91.1 | |
| 8 | 90.2 | |
| 9 | 90.2 | |
| 10 | 90.2 | |
| 11 | 90.2 | |
| 12 | 90.0 | |
| 13 | 89.7 | |
| 14 | 89.7 | |
| 15 | 89.5 | |
| 16 | 89.5 | |
| 17 | 88.9 | |
| 18 | 88.7 | |
| 19 | 88.5 | |
| 20 | 88.3 | |
| 21 | 88.3 | |
| 22 | 87.8 | |
| 23 | 87.6 | |
| 24 | 86.0 | |
| 25 | 85.6 | |
| 26 | 85.4 | |
| 27 | 83.9 | |
| 28 | 82.4 | |
| 29 | 81.2 | |
| 30 | 81.0 | |
| 31 | 80.0 | |
| 32 | 60.3 |
Frequently asked
Pulled from the OpenCompass · IFEval dataset · updated daily
What does OpenCompass · IFEval measure?
OpenCompass · IFEval is a knowledge benchmark in the BenchGecko catalog. 32 AI models have been tested on it. Scores range from 60.3 to 93.9 out of 100.
Which model leads on OpenCompass · IFEval?
Kimi K2.5 from moonshotai leads OpenCompass · IFEval with a score of 93.9. The median score across 32 tested models is 89.2.
Is OpenCompass · IFEval saturated?
No · the top score is 93.9 out of 100 (94%). There is still meaningful room for improvement on OpenCompass · IFEval.
Does OpenCompass · IFEval predict performance on other benchmarks?
Yes · OpenCompass · IFEval scores correlate 0.90 with LiveBench · Overall across 10 shared models. Models that do well on OpenCompass · IFEval tend to do well on LiveBench · Overall.
How often is OpenCompass · IFEval data refreshed?
BenchGecko pulls updates daily. New model scores on OpenCompass · IFEval appear as soon as they are published by Epoch AI or the model provider.
Top on OpenCompass · IFEval
Kimi K2.5 · 93.9GLM 5 · 93.2Step 3.5 Flash · 93.2Kimi K2 Thinking · 92.4DeepSeek V3.2 Speciale · 91.7More knowledge benchmarks
Same category · related evaluations