Benchmark · CodeCompetitive

WeirdML

WeirdML · tests models on unusual and adversarial machine learning tasks that require creative problem-solving beyond standard patterns.

Updated 2026-03-05
Models tested
70
Top score
79.3
GPT-5.3-Codex
Median
40.2
min 1.7
Top-5 spread
σ 3.6
Competitive

Best score over time · one chart, every benchmark

WEIRDML55 MODELS · FRONTIER RUNNING MAX0255075100SCORE ↑Jul 24Dec 24May 25Oct 25Mar 26RELEASE DATE →benchgecko.ai/benchmark/weirdml · frontier
Frontier on WeirdML rose from 21.4 to 79.3 in 20 months · +57.9 points · latest leader GPT-5.3-Codex from OpenAI.
Pink dots = frontier records · 12 totalClick to open model page

70 models tested · sorted by score

#ModelScore
1OpenAI logoGPT-5.3-Codex79.3
2Anthropic logoClaude Opus 4.677.9
3OpenAI logoGPT-5.272.2
4Google DeepMind logoGemini 3.1 Pro Preview72.1
5Google DeepMind logoGemini 3 Pro69.9
6Anthropic logoClaude Sonnet 4.666.1
7Anthropic logoClaude Opus 4.563.7
8Google DeepMind logoGemini 3 Flash Preview61.6
9OpenAI logoGPT-5.160.8
10OpenAI logoGPT-560.7
11OpenAI logoGPT-5 Pro60.4
12OpenAI logoo3 Pro58.2
13OpenAI logoGPT-5.457.4
14Google DeepMind logoGemini 2.5 Pro54.0
15OpenAI logoGPT-5 Mini52.7
16OpenAI logoo4 Mini52.6
17OpenAI logoo352.4
18z-ai logoGLM 548.2
19OpenAI logogpt-oss-120b48.2
20Anthropic logoClaude Sonnet 4.547.7
21OpenAI logoo1-preview47.6
22Anthropic logoClaude Sonnet 446.1
23xAI logoGrok 445.7
24moonshotai logoKimi K2.545.6
25Anthropic logoClaude Haiku 4.545.4
26OpenAI logoo143.8
27OpenAI logoo3 Mini43.7
28Anthropic logoClaude Opus 443.4
29xAI logoGrok 4 Fast42.9
30moonshotai logoKimi K2 Thinking42.8
31Anthropic logoClaude Opus 4.142.8
32xAI logoGrok 3 Mini42.6
33DeepSeek logoR1 052841.6
34Alibaba Qwen logoQwen3 235B A22B Thinking 250741.0
35Google DeepMind logoGemini 2.5 Flash41.0
36DeepSeek logoDeepSeek V3.2 Exp39.5
37OpenAI logoGPT-4.539.4
38moonshotai logoKimi K2 071139.4
39OpenAI logoGPT-4.139.0
40Alibaba Qwen logoQwen3 235B A22B Instruct 250738.7
41DeepSeek logoDeepSeek V3.138.4
42OpenAI logoGPT-5 Nano38.1
43OpenAI logoGPT-4.1 Mini37.6
44Alibaba Qwen logoQwen3 235B A22B37.3
45xAI logoGrok 337.2
46DeepSeek logoR136.5
47OpenAI logoo1-mini36.3
48DeepSeek logoDeepSeek V336.1
49Anthropic logoClaude 3.5 Sonnet31.0
50Anthropic logoClaude 3.5 Haiku30.7
51Google DeepMind logoGemini 2.0 Flash25.8
52OpenAI logoGPT-4o (2024-11-20)25.1
53Google DeepMind logoGemini 1.5 Flash (May 2024)24.9
54Meta logoLlama 4 Maverick24.5
55Anthropic logoClaude 3 Opus23.2
56xAI logoGrok-2 (Dec 2024)22.2
57Google DeepMind logoGemini 1.5 Pro (Feb 2024)22.2
58Meta logoLlama 3.1 405B21.4
59OpenAI logoGPT-4.1 Nano19.0
60Meta logoLlama 3.3 70B Instruct (free)14.4
61OpenAI logoGPT-4 Turbo12.4
62OpenAI logoGPT-4o-mini11.8
63OpenAI logoGPT-4o-mini (2024-07-18)11.8
64Anthropic logoClaude 3 Sonnet10.2
65Anthropic logoClaude 3 Haiku9.8
66Meta logoLlama 3.1 70B Instruct9.0
67Anthropic logoClaude 2.17.1
68OpenAI logoGPT-3.5 Turbo (older v0613)3.5
69Mistral AI logoMixtral 8x22B Instruct3.2
70Meta logoLlama 3.1 8B Instruct1.7

Same category · related evaluations