Gecko Tests

Same Prompts. Same Models. Raw Answers.

Daily AI behavior tests covering censorship, race bias, political orientation, IQ, moral dilemmas, and model drift.

16 frontier models · 7 tests · updated daily · raw answers public

Every Gecko Test sends identical prompts to every model through the same API gateway (OpenRouter). No system prompts. No temperature tuning. Default settings only.

Responses are classified by automated scorers with keyword patterns and LLM judge verification. Every raw response is stored and publicly available for independent verification.

Models are tested on a tiered schedule: Tier 1 (frontier) daily, Tier 2 (strong) twice per week, Tier 3 (open source) weekly. Budget guards prevent runaway costs.

Every chart is free to embed. Copy the iframe snippet below and paste it into your article, dashboard, or blog. Attribution link required.

<iframe
  src="https://benchgecko.ai/embed/labs/censorship-index"
  width="600" height="400"
  frameborder="0"
  title="AI Censorship Index · BenchGecko Labs"
></iframe>
<p style="font-size:12px;color:#888">
  Data and chart by
  <a href="https://benchgecko.ai/gecko-tests/censorship-index">BenchGecko Labs</a>
  · Updated daily
</p>
Gecko Tests are proprietary daily tests run by BenchGecko on frontier AI models. They measure censorship behavior, racial bias, political orientation, reasoning ability, moral decision-making, and behavioral drift over time.