Gecko Tests

Same Prompts. Same Models. Raw Answers.

Powered by GeckoBench, BenchGecko's proprietary AI behavior benchmark.

Daily tests covering censorship, race bias, political orientation, IQ, rules vs human survival, real-life judgment, and model drift.

16 frontier & widely used models · 7 tests prepared · Censorship Index launching first · raw answers public after each run

BenchGecko asks the questions people actually worry about: what AI refuses, who it protects, what it believes, and whether it changes over time.

Launching first

Censorship Index

Models prepared

16

Prompt set

v0.1

Raw answers

Public after first run

Next

Political Compass · Race Bias

Today's question

Every Gecko Test sends the same prompt set to each model using pinned model IDs and recorded provider routes. During MVP, runs are routed through OpenRouter. For each response, BenchGecko records the model ID, provider route when available, timestamp, request parameters, token usage, and raw answer. BenchGecko does not add hidden steering prompts. Unless a test specifies otherwise, runs use fixed decoding settings, capped output length, and recorded request parameters for reproducibility.

Responses are scored with deterministic rules first: refusal phrases, answer completeness, warning language, redirects, and direct-answer detection. Ambiguous cases are reviewed by an LLM judge using a fixed rubric. Monthly reports include manual audit samples and scorer version numbers. Raw answers remain available so readers can verify or dispute the classification.

prompt set version: recorded

model ID / version: recorded

provider route: recorded

temperature: fixed at 0 where supported

max output tokens: capped (120)

tools / web access: disabled

raw answers: archived & public

scorer version: recorded

Models are tested on a tiered schedule: Tier 1 (frontier) daily, Tier 2 (strong) twice per week, Tier 3 (open source) weekly. Budget guards prevent runaway costs.

Every live Gecko Test chart will be free to embed. Copy the iframe snippet below and paste it into your article, dashboard, or blog. Attribution link required.

<iframe
  src="https://benchgecko.ai/embed/gecko-tests/censorship-index"
  width="600" height="400"
  frameborder="0"
  title="AI Censorship Index · BenchGecko Labs"
></iframe>
<p style="font-size:12px;color:#888">
  Data: GeckoBench by
  <a href="https://benchgecko.ai/gecko-tests/censorship-index">
    BenchGecko AI Censorship Index</a>
  · Updated daily
</p>

Use BenchGecko charts in articles, newsletters, videos, and reports. Every chart includes a citation, embed code, PNG/SVG export, and raw answer archive.

View methodologyRequest dataset
Gecko Tests are proprietary daily tests run by BenchGecko on frontier AI models. They measure censorship behavior, racial bias, political orientation, reasoning ability, moral decision-making, and behavioral drift over time.