Beta
Benchmark · Knowledge

MASK

Updated 2026-04-07
Models tested
2
Top score
96.3
Claude Opus 4.6 (Fast)
Median
95.8
min 95.3
Top-5 spread
σ 0.5
settled

Where models cluster

SCORE DISTRIBUTION0–1010–2020–3030–4040–5050–6060–7070–8080–90290–100MEDIAN · 95.8SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Not enough overlapping models yet.

2 models tested · sorted by score

Pulled from the MASK dataset · updated daily

What does MASK measure?

MASK is a knowledge benchmark in the BenchGecko catalog. 2 AI models have been tested on it. Scores range from 95.3 to 96.3 out of 100.

Which model leads on MASK?

Claude Opus 4.6 (Fast) from Anthropic leads MASK with a score of 96.3. The median score across 2 tested models is 95.8.

Is MASK saturated?

Yes · the top model on MASK has reached 96.3 out of 100, within 5% of the theoretical ceiling. This benchmark is approaching saturation and may be replaced by a harder successor.

What makes MASK distinctive?

MASK is a knowledge benchmark with limited overlap to the rest of the catalog · it measures capabilities that are not well-covered by other benchmarks we track.

How often is MASK data refreshed?

BenchGecko pulls updates daily. New model scores on MASK appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations