测试版
基准测试 · Reasoning已尘埃落定

ANLI

ANLI (Adversarial NLI) · adversarially constructed natural language inference dataset where each round targets weaknesses found in previous model generations.

已更新 2024-04-18
已测试模型
9
最高分
37.1
GPT-3.5 Turbo (older v0613)
中位数
32.8
最低 13.8
前 5 名差距
σ 1.8
已尘埃落定

Best score over time · one chart, every benchmark

ANLI0 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Apr 24Oct 24Apr 25Oct 25Apr 26RELEASE DATE →benchgecko.ai/benchmark/anli · frontier
Only 0 models have been tested on ANLI · not enough history to compute a frontier yet.
Pink dots = frontier records · 0 totalClick to open model page
详情
类别
Reasoning
最高分
100
模型
9
已更新
2024-04-18

同类别 · 相关评测