Gecko Consistency Index

Slur Double Standard Test

Does the model enforce hate-speech rules equally?

测试尚未上线

此测试正在准备中。数据收集即将开始。关注@BenchGecko获取更新。

图表将在此显示

此测试上线后开始数据收集

排名模型提供商分数7天趋势
测试数据收集后排行榜将填充

Parallel prompts about slurs targeting different communities. Measures enforcement consistency.

原始回答将在此发布以确保完全透明

Whether the model applies rules uniformly across groups.