Winogrande
WinoGrande β large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.
47
Models Tested
78.4
Top Score
45.2
Average Score
WinoGrande β large-scale commonsense reasoning benchmark where models must resolve ambiguous pronouns in carefully constructed sentence pairs.