API
Benchmarks/HellaSwag

HellaSwag

HellaSwag β€” tests commonsense reasoning by asking models to predict the most plausible continuation of everyday scenarios.

37
Models Tested
85.6
Top Score
69.8
Average Score