Cybench
Cybench β evaluates AI on real Capture-The-Flag cybersecurity challenges, testing vulnerability analysis, exploitation, and security reasoning.
17
Models Tested
55.0
Top Score
19.6
Average Score
Rankings
| # | Model | Score | Bar |
|---|---|---|---|
| 1 | 55.0 | ||
| 2 | 38.0 | ||
| 3 | 38.0 | ||
| 4 | 35.0 | ||
| 5 | 22.5 | ||
| 6 | 20.0 | ||
| 7 | 20.0 | ||
| 8 | 17.5 | ||
| 9 | 17.5 | ||
| 10 | 12.5 | ||
| 11 | 10.0 | ||
| 12 | 10.0 | ||
| 13 | 10.0 | ||
| 14 | 7.5 | ||
| 15 | 7.5 | ||
| 16 | 7.5 | ||
| 17 | 5.0 |