Beta
Category47 terms · BenchGecko glossary

Benchmarks

How models are measured · SWE-bench, GPQA, MMLU.

Learn hub
Most-read in Benchmarks
Everything in this category
Explore more
The Benchmarks category covers 47 terms. How models are measured · SWE-bench, GPQA, MMLU. Every term has four depth levels (TL;DR, Basic, Deep, Expert), role-based takeaways, FAQs, and live BenchGecko data where available.