Beta
Benchmark · Knowledge

C-Eval

Updated 2023-05-28
Models tested
2
Top score
68.7
GPT-4
Median
53.8
min 38.8
Top-5 spread
σ 15.0
wide open

Where models cluster

SCORE DISTRIBUTION0–1010–2020–30130–4040–5050–60160–7070–8080–9090–100MEDIAN · 53.8SCORE BUCKET → (0 TO 100)MODELSbenchgecko.ai

Pearson r · original research

Not enough overlapping models yet.

2 models tested · sorted by score

#ModelScore
1OpenAI logoGPT-468.7
2Meta logoLLaMA-13B38.8

Pulled from the C-Eval dataset · updated daily

What does C-Eval measure?

C-Eval is a knowledge benchmark in the BenchGecko catalog. 2 AI models have been tested on it. Scores range from 38.8 to 68.7 out of 100.

Which model leads on C-Eval?

GPT-4 from OpenAI leads C-Eval with a score of 68.7. The median score across 2 tested models is 53.8.

Is C-Eval saturated?

No · the top score is 68.7 out of 100 (69%). There is still meaningful room for improvement on C-Eval.

What makes C-Eval distinctive?

C-Eval is a knowledge benchmark with limited overlap to the rest of the catalog · it measures capabilities that are not well-covered by other benchmarks we track.

How often is C-Eval data refreshed?

BenchGecko pulls updates daily. New model scores on C-Eval appear as soon as they are published by Epoch AI or the model provider.

Same category · related evaluations