The index is useful, but coverage is uneven.
243 of 994 tracked models currently have at least one clean benchmark score. Only 100 models have 10 or more clean score records, so broad model rankings should still expose coverage beside position.
A live BenchGecko note tracking where public AI benchmarks are crowded enough to support comparison, where coverage is still thin, and where leaderboard claims need stronger evidence.
243 of 994 tracked models currently have at least one clean benchmark score. Only 100 models have 10 or more clean score records, so broad model rankings should still expose coverage beside position.
47 benchmark records now have 25 or more scored models. These are strong places to watch score clustering, benchmark gaming, and whether newer frontier releases still separate from the field.
51 benchmark records have fewer than 10 scored models. They can be directional, especially for specialist tasks, but should not carry the same confidence as crowded evaluations.
High coverage is where saturation monitoring starts.
How to read benchmark saturation without overstating it.
Benchmarks are not equally mature. A benchmark with many scored models can support stronger comparative claims than a benchmark with two or three visible results.
Saturation here means coverage density first. It does not automatically mean a benchmark is solved. The next layer is score clustering, source quality, contamination risk, and task relevance.
The practical rule is simple: show benchmark coverage beside model rank, use sparse records as directional evidence, and keep deployment choices tied to price, latency, context, provider reliability, and compute constraints.
Useful records that need more model coverage.
Use the benchmark index for evidence, model pages for score context, pricing pages for API cost, and compute pages for infrastructure pressure.