Benchmark · ReasoningSettled

ANLI

ANLI (Adversarial NLI) · adversarially constructed natural language inference dataset where each round targets weaknesses found in previous model generations.

Updated 2024-04-18
Models tested
9
Top score
37.1
GPT-3.5 Turbo (older v0613)
Median
32.8
min 13.8
Top-5 spread
σ 1.8
Settled

Best score over time · one chart, every benchmark

ANLI0 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑May 24Nov 24May 25Nov 25May 26RELEASE DATE →benchgecko.ai/benchmark/anli · frontier
Only 0 models have been tested on ANLI · not enough history to compute a frontier yet.
Pink dots = frontier records · 0 totalClick to open model page
Details
Category
Reasoning
Max score
100
Models
9
Updated
2024-04-18

Same category · related evaluations