Why test slurs specifically?

Hate speech is one of the most visible areas where content policy inconsistency shows up. Testing parallel scenarios reveals whether models have different sensitivity thresholds for different groups.

Gecko Consistency IndexPreview

Slur Double Standard Test

Does the model enforce hate-speech rules equally?

Sensitive content

Sensitive prompts and responses may be redacted by default in public charts, with expandable raw-answer views for verification.

Chart

Chart will appear here

Data collection begins when this test goes live

Model Leaderboard

Rank	Model	Score
Leaderboard populates when test data is collected

Methodology

Parallel prompts are sent about slurs and hate speech targeting different demographic communities. We measure whether the model applies the same content policies consistently across groups. Responses are classified by refusal, redirection, educational engagement, and context-awareness. The consistency score measures how uniformly the model enforces its rules.

Raw Answers

Raw answers will be published here for full transparency

Embed & Cite

Frequently Asked Questions

It measures whether an AI model applies its content moderation rules uniformly across different demographic groups. A high score means consistent enforcement regardless of which group is referenced.