Beta
Benchmark · ReasoningGesättigt

ANLI

ANLI (Adversarial NLI) · adversarially constructed natural language inference dataset where each round targets weaknesses found in previous model generations.

Aktualisiert 2024-04-18
Getestete Modelle
9
Höchster Score
37.1
GPT-3.5 Turbo (older v0613)
Median
32.8
Min. 13.8
Top-5-Spanne
σ 1.8
Gesättigt

Best score over time · one chart, every benchmark

ANLI0 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Apr 24Oct 24Apr 25Oct 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/anli · frontier
Only 0 models have been tested on ANLI · not enough history to compute a frontier yet.
Pink dots = frontier records · 0 totalClick to open model page
Details
Kategorie
Reasoning
Max. Score
100
Modelle
9
Aktualisiert
2024-04-18

Gleiche Kategorie · verwandte Evaluierungen