| 1 | Claude Mythos Preview🇺🇸 Anthropic | 81.8 | 94.5 | - | - | - | N/A | 1.0M |
| 2 | Claude Instant🇺🇸 Anthropic | 78.0 | - | 64.5 | - | 78.9 | N/A | 0K |
| 3 | DeepSeek-V2 (MoE-236B, May 2024)🇨🇳 DeepSeekOpen | 76.5 | - | 71.2 | - | 80.0 | N/A | 0K |
| 4 | phi-3-small 7.4B🇺🇸 MicrosoftOpen | 67.4 | - | 67.6 | - | 58.1 | N/A | 0K |
| 5 | GPT-5.4 Pro🇺🇸 OpenAI | 66.7 | 92.8 | - | 47.8 | - | $30.00 | 1.1M |
| 6 | phi-3-mini 3.8B🇺🇸 MicrosoftOpen | 61.0 | - | 58.4 | - | 64.0 | N/A | 0K |
| 7 | Qwen-14B🇨🇳 Alibaba QwenOpen | 60.7 | - | 55.1 | - | - | N/A | 0K |
| 8 | Gemini 3.1 Pro Preview🇺🇸 Google DeepMind | 60.6 | 92.1 | - | 77.3 | - | $2.00 | 1.0M |
| 9 | Gemini 3 Pro🇺🇸 Google DeepMind | 60.5 | 90.2 | - | 72.9 | - | N/A | 0K |
| 10 | DeepSeek V3🇨🇳 DeepSeekOpen | 59.0 | 42.0 | 82.9 | - | 82.9 | $0.32 | 164K |
| 11 | GPT-5.4🇺🇸 OpenAI | 59.0 | 91.1 | - | 44.8 | - | $2.50 | 1.1M |
| 12 | U Muse Spark Unknown | 59.0 | 86.4 | - | 66.3 | - | N/A | 0K |
| 13 | phi-3-medium 14B🇺🇸 MicrosoftOpen | 58.6 | 3.5 | 70.7 | - | 73.9 | N/A | 0K |
| 14 | Qwen3 Max🇨🇳 Alibaba QwenOpen | 58.3 | 63.5 | - | 67.5 | - | $0.78 | 262K |
| 15 | Falcon 2 11B TIIOpen | 58.0 | - | 44.5 | - | - | N/A | 0K |
| 16 | R1 0528🇨🇳 DeepSeekOpen | 57.9 | 68.4 | - | 27.4 | - | $0.50 | 164K |
| 17 | Mixtral 8x7B Instruct🇫🇷 Mistral AIOpen | 57.8 | 7.5 | 60.8 | - | 82.2 | $0.54 | 33K |
| 18 | GLM 5🇨🇳 z-aiOpen | 57.6 | 83.8 | - | - | - | $0.72 | 80K |
| 19 | Claude Opus 4.6🇺🇸 Anthropic | 57.5 | 87.4 | - | 46.5 | - | $5.00 | 1.0M |
| 20 | o1🇺🇸 OpenAI | 56.4 | 69.0 | - | - | - | $15.00 | 200K |
| 21 | Qwen3 235B A22B🇨🇳 Alibaba QwenOpen | 56.4 | 60.9 | - | - | - | $0.46 | 131K |
| 22 | Gemini 2.5 Pro🇺🇸 Google DeepMind | 56.2 | 80.4 | - | 56.0 | - | $1.25 | 1.0M |
| 23 | GPT-5 Mini🇺🇸 OpenAI | 56.0 | 66.7 | - | 21.0 | - | $0.25 | 400K |
| 24 | Qwen3 235B A22B Thinking 2507🇨🇳 Alibaba QwenOpen | 55.9 | 73.4 | - | 50.1 | - | $0.15 | 131K |
| 25 | o3🇺🇸 OpenAI | 55.2 | 75.8 | - | 53.0 | - | $2.00 | 200K |
| 26 | GPT-4 (older v0314)🇺🇸 OpenAI | 55.0 | 14.3 | 81.9 | - | - | $30.00 | 8K |
| 27 | Grok 4🇺🇸 xAI | 54.8 | 82.7 | - | 47.9 | - | $3.00 | 256K |
| 28 | GPT-5🇺🇸 OpenAI | 54.4 | 81.6 | - | 50.6 | - | $1.25 | 400K |
| 29 | GPT-5.2🇺🇸 OpenAI | 54.0 | 88.5 | - | 38.9 | - | $1.75 | 400K |
| 30 | Gemini 2.0 Pro🇺🇸 Google DeepMind | 53.7 | 54.2 | - | - | - | N/A | 0K |
| 31 | U Nemotron-4 15B Unknown | 53.4 | - | 44.9 | - | - | N/A | 0K |
| 32 | Kimi K2 Thinking🇨🇳 moonshotaiOpen | 53.3 | 79.0 | - | 31.6 | - | $0.60 | 262K |
| 33 | o4 Mini🇺🇸 OpenAI | 53.2 | 72.8 | - | 23.9 | - | $1.10 | 200K |
| 34 | Qwen2.5 72B Instruct🇨🇳 Alibaba QwenOpen | 53.2 | 32.2 | 80.4 | - | 71.9 | $0.12 | 33K |
| 35 | Qwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen | 53.1 | - | 72.1 | - | - | $0.66 | 33K |
| 36 | DeepSeek V3.2🇨🇳 DeepSeekOpen | 53.0 | 77.9 | - | 27.5 | - | $0.26 | 164K |
| 37 | Kimi K2.5🇨🇳 moonshotaiOpen | 52.0 | 83.5 | - | 33.9 | - | $0.38 | 262K |
| 38 | GPT-4o (2024-05-13)🇺🇸 OpenAI | 51.1 | 31.9 | 78.9 | - | - | $5.00 | 128K |
| 39 | GPT-4 Turbo🇺🇸 OpenAI | 51.0 | 7.5 | 76.5 | - | 84.8 | $10.00 | 128K |
| 40 | GLM 4.7🇨🇳 z-aiOpen | 50.5 | 77.8 | - | 31.5 | - | $0.39 | 203K |
| 41 | GPT-5.1🇺🇸 OpenAI | 49.6 | 83.5 | - | 48.9 | - | $1.25 | 400K |
| 42 | Gemini 3 Flash Preview🇺🇸 Google DeepMind | 49.1 | 77.6 | - | 67.4 | - | $0.50 | 1.0M |
| 43 | Gemini 2.0 Flash🇺🇸 Google DeepMind | 48.0 | 52.2 | 72.9 | - | - | $0.10 | 1.0M |
| 44 | U Stable Beluga 2 Unknown | 47.8 | - | 58.1 | - | - | N/A | 0K |
| 45 | Claude 3.7 Sonnet🇺🇸 Anthropic | 47.7 | 73.0 | - | - | - | $3.00 | 200K |
| 46 | Claude Sonnet 4.6🇺🇸 Anthropic | 47.6 | 83.2 | - | 29.0 | - | $3.00 | 1.0M |
| 47 | Gemini 1.5 Flash (May 2024)🇺🇸 Google DeepMind | 47.4 | 20.5 | 70.5 | - | - | N/A | 0K |
| 48 | gpt-oss-120b🇺🇸 OpenAIOpen | 46.9 | 67.7 | - | 13.9 | - | $0.04 | 131K |
| 49 | Grok 3 Mini🇺🇸 xAI | 46.6 | 68.3 | - | 21.1 | - | $0.30 | 131K |
| 50 | GPT-3.5 Turbo (older v0613)🇺🇸 OpenAI | 45.8 | 2.9 | 56.4 | - | 85.8 | $1.00 | 4K |
| 51 | Mistral Large 2411🇫🇷 Mistral AIOpen | 45.8 | 35.1 | - | - | - | $2.00 | 131K |
| 52 | Claude Opus 4.5🇺🇸 Anthropic | 45.4 | 81.4 | - | 41.8 | - | $5.00 | 200K |
| 53 | GPT-5 Nano🇺🇸 OpenAI | 45.3 | 59.3 | - | 12.2 | - | $0.05 | 400K |
| 54 | R1🇨🇳 DeepSeekOpen | 45.1 | 62.3 | - | 27.4 | - | $0.70 | 64K |
| 55 | Claude Sonnet 4🇺🇸 Anthropic | 44.6 | 72.3 | - | - | - | $3.00 | 1.0M |
| 56 | GPT-4.1 Mini🇺🇸 OpenAI | 44.5 | 54.5 | - | - | - | $0.40 | 1.0M |
| 57 | Falcon-180B TIIOpen | 44.4 | - | 60.8 | - | 79.9 | N/A | 0K |
| 58 | Qwen2.5 Coder 7B Instruct🇨🇳 Alibaba QwenOpen | 44.4 | - | 57.3 | - | - | $0.03 | 33K |
| 59 | GPT-4.1🇺🇸 OpenAI | 43.3 | 55.9 | - | - | - | $2.00 | 1.0M |
| 60 | GPT-4o-mini (2024-07-18)🇺🇸 OpenAI | 43.2 | 17.0 | 75.7 | - | - | $0.15 | 128K |
| 61 | Phi 4🇺🇸 MicrosoftOpen | 43.2 | 41.4 | 79.7 | - | - | $0.07 | 16K |
| 62 | Llama 2-13B🇺🇸 MetaOpen | 42.5 | 1.8 | 40.8 | - | 79.6 | N/A | 0K |
| 63 | Claude 3.5 Sonnet🇺🇸 Anthropic | 42.3 | 38.7 | 82.0 | - | - | N/A | 0K |
| 64 | Gemma 3 27B🇺🇸 Google DeepMindOpen | 42.2 | 31.8 | - | - | - | $0.08 | 131K |
| 65 | Gemma 3 27B (free)🇺🇸 Google DeepMindOpen | 42.2 | 31.8 | - | - | - | Free | 131K |
| 66 | Claude Sonnet 4.5🇺🇸 Anthropic | 42.1 | 76.4 | - | 23.6 | - | $3.00 | 1.0M |
| 67 | Claude Opus 4🇺🇸 Anthropic | 41.7 | 68.3 | - | - | - | $15.00 | 200K |
| 68 | Mistral 7B V0.1🇫🇷 Mistral AIOpen | 41.6 | - | 50.0 | - | 75.2 | N/A | 0K |
| 69 | o1-preview🇺🇸 OpenAI | 41.5 | 33.8 | - | - | - | N/A | 0K |
| 70 | Claude Opus 4.1🇺🇸 Anthropic | 41.3 | 69.7 | - | 34.8 | - | $15.00 | 200K |
| 71 | Gemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind | 41.3 | 27.8 | 76.9 | - | - | N/A | 0K |
| 72 | Qwen2-72B🇨🇳 Alibaba QwenOpen | 41.3 | 21.0 | 76.5 | - | - | N/A | 0K |
| 73 | Qwen2.5-Max🇨🇳 Alibaba QwenOpen | 41.0 | 41.5 | - | - | - | N/A | 0K |
| 74 | U Baichuan 2-7B Unknown | 40.3 | - | 38.9 | - | - | N/A | 0K |
| 75 | Mistral Medium 3🇫🇷 Mistral AIOpen | 40.0 | 46.0 | - | - | - | $0.40 | 131K |
| 76 | GPT-4o-mini🇺🇸 OpenAI | 39.6 | 17.0 | 75.7 | - | - | $0.15 | 128K |
| 77 | Mistral Large 2407🇫🇷 Mistral AIOpen | 39.1 | 32.0 | 73.3 | - | - | $2.00 | 131K |
| 78 | Qwen2.5 Coder 1.5B Instruct🇨🇳 AlibabaOpen | 38.8 | - | 38.1 | - | - | N/A | 0K |
| 79 | Grok 3🇺🇸 xAI | 38.4 | 67.7 | - | - | - | $3.00 | 131K |
| 80 | o3 Mini🇺🇸 OpenAI | 38.4 | 69.4 | - | - | - | $1.10 | 200K |
| 81 | Llama 3.1 405B🇺🇸 MetaOpen | 38.0 | 34.5 | 79.3 | - | 82.7 | N/A | 0K |
| 82 | Llama 3.1 70B Instruct🇺🇸 MetaOpen | 37.8 | 25.6 | 73.5 | - | - | $0.40 | 131K |
| 83 | Gemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind | 37.7 | 42.8 | - | - | - | N/A | 0K |
| 84 | GPT-4o (2024-11-20)🇺🇸 OpenAI | 37.7 | 32.3 | 79.1 | - | - | $2.50 | 128K |
| 85 | Claude 2🇺🇸 Anthropic | 37.2 | 12.9 | 71.3 | - | 87.5 | N/A | 0K |
| 86 | Claude 3.5 Haiku🇺🇸 Anthropic | 37.2 | 17.5 | 65.7 | 6.7 | - | $0.80 | 200K |
| 87 | Mistral Nemo🇫🇷 Mistral AIOpen | 37.2 | 6.5 | - | - | - | $0.02 | 131K |
| 88 | Claude Haiku 4.5🇺🇸 Anthropic | 37.1 | 61.6 | - | 5.9 | - | $1.00 | 200K |
| 89 | Llama 3.2 90B🇺🇸 MetaOpen | 36.1 | 21.4 | 73.7 | - | - | N/A | 0K |
| 90 | Gemma 2 9B🇺🇸 Google DeepMindOpen | 36.0 | 3.3 | 62.8 | - | - | $0.03 | 8K |
| 91 | GPT-4.5🇺🇸 OpenAI | 35.9 | 58.3 | - | - | - | N/A | 0K |
| 92 | GPT-4o (2024-08-06)🇺🇸 OpenAI | 35.6 | 32.3 | 79.1 | - | - | $2.50 | 128K |
| 93 | GPT-4.1 Nano🇺🇸 OpenAI | 35.2 | 31.9 | - | - | - | $0.10 | 1.0M |
| 94 | LLaMA-13B🇺🇸 MetaOpen | 34.9 | - | 30.3 | - | 77.9 | N/A | 0K |
| 95 | o1-mini🇺🇸 OpenAI | 34.9 | 49.8 | - | - | - | N/A | 0K |
| 96 | U XGen-7B Unknown | 33.9 | - | 15.1 | - | - | N/A | 0K |
| 97 | Claude 3 Opus🇺🇸 Anthropic | 33.7 | 29.6 | 79.5 | - | - | N/A | 0K |
| 98 | Grok-2 (Dec 2024)🇺🇸 xAI | 33.2 | 38.4 | - | - | - | N/A | 0K |
| 99 | Gemma 2 27B🇺🇸 Google DeepMindOpen | 32.9 | 15.3 | 67.6 | - | - | $0.65 | 8K |
| 100 | Llama 3 70B Instruct🇺🇸 MetaOpen | 32.4 | 20.8 | 72.4 | - | - | $0.51 | 8K |
| 101 | U MPT-30B Unknown | 31.7 | - | 30.5 | - | 73.6 | N/A | 0K |
| 102 | U Yi 6B UnknownOpen | 31.4 | - | 52.0 | - | - | N/A | 0K |
| 103 | Llama 3 8B Instruct🇺🇸 MetaOpen | 30.8 | 1.4 | 58.4 | - | 67.7 | $0.03 | 8K |
| 104 | Phi 2🇺🇸 MicrosoftOpen | 30.2 | - | 44.5 | - | 45.2 | N/A | 0K |
| 105 | Mistral Large🇫🇷 Mistral AIOpen | 30.0 | 18.4 | 58.4 | - | - | $2.00 | 128K |
| 106 | U Dolly 2.0-12b Unknown | 29.2 | - | 1.6 | - | - | N/A | 0K |
| 107 | Gemma 2B🇺🇸 Google DeepMindOpen | 29.1 | - | 23.1 | - | 53.2 | N/A | 0K |
| 108 | Llama 3.3 70B Instruct (free)🇺🇸 MetaOpen | 29.1 | 29.9 | 81.7 | - | - | Free | 66K |
| 109 | Claude 3 Haiku🇺🇸 Anthropic | 28.7 | 15.1 | 65.1 | - | - | $0.25 | 200K |
| 110 | Claude 3 Sonnet🇺🇸 Anthropic | 28.3 | 20.8 | 67.9 | - | - | N/A | 0K |
| 111 | Llama 4 Maverick🇺🇸 MetaOpen | 28.0 | 56.0 | - | - | - | $0.15 | 1.0M |
| 112 | Llama 3.1 8B Instruct🇺🇸 MetaOpen | 27.4 | 1.3 | 41.5 | - | - | $0.02 | 16K |
| 113 | DeepSeek Coder 33B🇨🇳 DeepSeekOpen | 25.4 | - | 19.2 | - | - | N/A | 0K |
| 114 | U StarCoder 2 15B UnknownOpen | 24.3 | - | 52.1 | - | - | N/A | 0K |
| 115 | U Baichuan1-7B Unknown | 23.7 | - | 23.1 | - | - | N/A | 0K |
| 116 | Mixtral 8x22B Instruct🇫🇷 Mistral AIOpen | 23.5 | 12.1 | 70.4 | - | - | $2.00 | 66K |
| 117 | Cerebras-GPT-13B🇺🇸 OpenAI | 23.4 | - | 1.6 | - | - | N/A | 0K |
| 118 | Gemini 1.0 Pro🇺🇸 Google DeepMind | 21.1 | 11.9 | 60.0 | - | - | N/A | 0K |
| 119 | Claude 2.1🇺🇸 Anthropic | 21.0 | 10.6 | 64.7 | - | - | N/A | 0K |
| 120 | U INTELLECT-1 Unknown | 20.2 | - | 33.2 | - | - | N/A | 0K |
| 121 | Llama 4 Scout🇺🇸 MetaOpen | 18.9 | 35.8 | - | - | - | $0.08 | 328K |
| 122 | DeepSeek Coder 6.7B🇨🇳 DeepSeekOpen | 16.7 | - | 15.2 | - | - | N/A | 0K |
| 123 | U Magistral Small 1.1 Unknown | 16.6 | 31.2 | - | - | - | N/A | 0K |
| 124 | Phi-1.5🇺🇸 MicrosoftOpen | 16.3 | - | 16.8 | - | - | N/A | 0K |
| 125 | DeepSeek Coder 1.3B🇨🇳 DeepSeekOpen | 3.2 | - | 1.1 | - | - | N/A | 0K |