Gecko Drift Index

Model Drift Index

Which models changed behavior the most this week?

Test not yet live

This test is being prepared. Data collection will begin soon. Follow @BenchGecko for launch updates.

Chart will appear here

Data collection begins when this test goes live

RankModelProviderScore7d Trend
Leaderboard populates when test data is collected

The Model Drift Index requires no additional API calls. It is computed from week-over-week changes in all other Gecko Test scores. The drift magnitude per model is the root mean square of score changes across censorship, bias, political, reasoning, and moral dimensions. This is the accountability layer: when a provider quietly updates RLHF tuning, the drift index catches it.

Raw answers will be published here for full transparency

By comparing this week's test scores to last week's across all Gecko Tests. Large changes in censorship rate, bias symmetry, political positioning, or moral reasoning indicate the model was updated, even if the provider did not announce it.