Daily AI Tests, Behavior Data & Charts People Cite
We run the same prompts on every frontier model, every day. Raw answers. Public charts. Embeddable data. The AI behavior layer nobody else is building.
Today's Signal
Live signals will appear here once Gecko Tests go live. First test: Censorship Index.
What is BenchGecko Labs?
Traditional benchmarks measure how well a model performs. Labs measures how a model behaves. We track censorship patterns, bias asymmetries, political orientations, moral reasoning, and behavioral drift that standard benchmarks miss entirely.
Every test runs the same prompts on every model, every day. Results are scored, charted, and published with full raw answers. No black box. No editorial spin. Just data.
Every chart is embeddable with a single line of code. Every dataset is citable with APA and BibTeX formats. Built for journalists, researchers, and anyone tracking how AI actually behaves.
Featured Tests
Censorship Index
Which AI refuses the most?
View testRace Bias Index
Does the model treat identical race-swapped scenarios differently?
View testAI Political Compass
Where does each AI model sit politically?
View testAI IQ Test
Which AI model reasons best?
View test