MUSR
The Frontier
Best score over time · one chart, every benchmark
Distribution
Where models cluster
Correlated benchmarks
Pearson r · original research
Benchmarks that track with MUSR
Pearson correlation across models scored on both benchmarks. Closer to 1 = strongly predictive.
Full rankings
73 models tested · sorted by score
Frequently asked
Pulled from the MUSR dataset · updated daily
What does MUSR measure?
MUSR is a knowledge benchmark in the BenchGecko catalog. 73 AI models have been tested on it. Scores range from 0.5 to 28.7 out of 100.
Which model leads on MUSR?
DeepSeek R1 Distill Qwen 14B from DeepSeek leads MUSR with a score of 28.7. The median score across 73 tested models is 9.7.
Is MUSR saturated?
No · the top score is 28.7 out of 100 (29%). There is still meaningful room for improvement on MUSR.
Does MUSR predict performance on other benchmarks?
Yes · MUSR scores correlate 0.86 with OpenBookQA across 9 shared models. Models that do well on MUSR tend to do well on OpenBookQA.
How often is MUSR data refreshed?
BenchGecko pulls updates daily. New model scores on MUSR appear as soon as they are published by Epoch AI or the model provider.
More knowledge benchmarks
Same category · related evaluations