MMLU
Massive Multitask Language Understanding β 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.
92
Models Tested
84.1
Top Score
55.4
Average Score
Massive Multitask Language Understanding β 57 subjects spanning STEM, humanities, social sciences, and more. The standard benchmark for broad knowledge.