Beta
Benchmark · ReasoningSaturo

ANLI

ANLI (Adversarial NLI) · adversarially constructed natural language inference dataset where each round targets weaknesses found in previous model generations.

Aggiornato 2024-04-18
Modelli testati
9
Punteggio massimo
37.1
GPT-3.5 Turbo (older v0613)
Mediana
32.8
min 13.8
Divario top 5
σ 1.8
Saturo

Best score over time · one chart, every benchmark

ANLI0 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Apr 24Oct 24Apr 25Oct 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/anli · frontier
Only 0 models have been tested on ANLI · not enough history to compute a frontier yet.
Pink dots = frontier records · 0 totalClick to open model page
Dettagli
Categoria
Reasoning
Punteggio massimo
100
Modelli
9
Aggiornato
2024-04-18

Stessa categoria · valutazioni correlate