Same Prompts. Same Models. Raw Answers.
Daily AI behavior tests covering censorship, race bias, political orientation, IQ, moral dilemmas, and model drift.
16 frontier models · 7 tests · updated daily · raw answers public
Censorship Index
Which AI refuses the most?
Race Bias Index
Does the model treat identical race-swapped scenarios differently?
Slur Double Standard Test
Does the model enforce hate-speech rules equally?
Would AI Let People Die?
Does the model choose rules or human survival?
AI Political Compass
Where does each AI model sit politically?
AI IQ Test
Which AI model reasons best?
Model Drift Index
Which models changed behavior the most this week?
Methodology
Every Gecko Test sends identical prompts to every model through the same API gateway (OpenRouter). No system prompts. No temperature tuning. Default settings only.
Responses are classified by automated scorers with keyword patterns and LLM judge verification. Every raw response is stored and publicly available for independent verification.
Models are tested on a tiered schedule: Tier 1 (frontier) daily, Tier 2 (strong) twice per week, Tier 3 (open source) weekly. Budget guards prevent runaway costs.
Embed & Cite
Every chart is free to embed. Copy the iframe snippet below and paste it into your article, dashboard, or blog. Attribution link required.
<iframe src="https://benchgecko.ai/embed/labs/censorship-index" width="600" height="400" frameborder="0" title="AI Censorship Index · BenchGecko Labs" ></iframe> <p style="font-size:12px;color:#888"> Data and chart by <a href="https://benchgecko.ai/gecko-tests/censorship-index">BenchGecko Labs</a> · Updated daily </p>