API

HLE

HLE (Humanity's Last Exam) β€” crowdsourced expert-level questions designed to be among the hardest possible challenges for AI systems across all domains.

27
Models Tested
34.4
Top Score
14.0
Average Score