| 1 | Claude Mythos Preview🇺🇸 Anthropic | 81.8 | - | - | - | 94.5 | 56.8 | - | N/A | 1.0M |
| 2 | DeepSeek-V2 (MoE-236B, May 2024)🇨🇳 DeepSeekOpen | 76.5 | - | - | 71.7 | - | - | - | N/A | 0K |
| 3 | phi-3-small 7.4B🇺🇸 MicrosoftOpen | 67.4 | - | - | 72.1 | - | - | - | N/A | 0K |
| 4 | GPT-5.4 Pro🇺🇸 OpenAI | 66.7 | 94.5 | 83.3 | - | 92.8 | - | 68.9 | $30.00 | 1.1M |
| 5 | o3 Pro🇺🇸 OpenAI | 61.2 | 59.3 | 4.9 | - | - | - | - | $20.00 | 200K |
| 6 | phi-3-mini 3.8B🇺🇸 MicrosoftOpen | 61.0 | - | - | 62.3 | - | - | - | N/A | 0K |
| 7 | Qwen-14B🇨🇳 Alibaba QwenOpen | 60.7 | - | - | 40.0 | - | - | - | N/A | 0K |
| 8 | Gemini 3.1 Pro Preview🇺🇸 Google DeepMind | 60.6 | 98.0 | 77.1 | - | 92.1 | - | 75.5 | $2.00 | 1.0M |
| 9 | Gemini 3 Pro🇺🇸 Google DeepMind | 60.5 | 75.0 | 31.1 | - | 90.2 | 34.4 | 71.7 | N/A | 0K |
| 10 | DeepSeek V3🇨🇳 DeepSeekOpen | 59.0 | - | - | 83.3 | 42.0 | - | 2.7 | $0.32 | 164K |
| 11 | GPT-5.4🇺🇸 OpenAI | 59.0 | 93.7 | 74.0 | - | 91.1 | - | - | $2.50 | 1.1M |
| 12 | U Muse Spark Unknown | 59.0 | - | - | - | 86.4 | - | - | N/A | 0K |
| 13 | phi-3-medium 14B🇺🇸 MicrosoftOpen | 58.6 | - | - | 75.2 | 3.5 | - | - | N/A | 0K |
| 14 | Qwen3 Max🇨🇳 Alibaba QwenOpen | 58.3 | - | - | - | 63.5 | - | - | $0.78 | 262K |
| 15 | Falcon 2 11B TIIOpen | 58.0 | - | - | - | - | - | - | N/A | 0K |
| 16 | R1 0528🇨🇳 DeepSeekOpen | 57.9 | 21.2 | 1.1 | - | 68.4 | - | 29.0 | $0.50 | 164K |
| 17 | Mixtral 8x7B Instruct🇫🇷 Mistral AIOpen | 57.8 | - | - | - | 7.5 | - | - | $0.54 | 33K |
| 18 | GLM 5🇨🇳 z-aiOpen | 57.6 | 44.7 | 4.9 | - | 83.8 | - | 43.8 | $0.72 | 80K |
| 19 | Claude Opus 4.6🇺🇸 Anthropic | 57.5 | 94.0 | 69.2 | - | 87.4 | 31.1 | 61.1 | $5.00 | 1.0M |
| 20 | o1🇺🇸 OpenAI | 56.4 | 30.7 | - | - | 69.0 | - | 28.1 | $15.00 | 200K |
| 21 | Qwen3 235B A22B🇨🇳 Alibaba QwenOpen | 56.4 | - | - | - | 60.9 | - | 17.2 | $0.46 | 131K |
| 22 | Gemini 2.5 Pro🇺🇸 Google DeepMind | 56.2 | 41.0 | 4.9 | - | 80.4 | 17.7 | 54.9 | $1.25 | 1.0M |
| 23 | GPT-5.2 Pro🇺🇸 OpenAI | 56.2 | 90.5 | 54.2 | - | - | - | 48.9 | $21.00 | 400K |
| 24 | Kimi K2 0711🇨🇳 moonshotaiOpen | 56.2 | - | - | - | - | - | 11.6 | $0.57 | 131K |
| 25 | GPT-5 Mini🇺🇸 OpenAI | 56.0 | 54.3 | 4.4 | - | 66.7 | 15.4 | - | $0.25 | 400K |
| 26 | Qwen3 235B A22B Thinking 2507🇨🇳 Alibaba QwenOpen | 55.9 | - | - | - | 73.4 | - | - | $0.15 | 131K |
| 27 | o3🇺🇸 OpenAI | 55.2 | 60.8 | 6.5 | - | 75.8 | 16.3 | 43.7 | $2.00 | 200K |
| 28 | MiniMax M2.5🇨🇳 minimaxOpen | 55.1 | 63.7 | 4.9 | - | - | - | - | $0.12 | 197K |
| 29 | GPT-4 (older v0314)🇺🇸 OpenAI | 55.0 | - | - | - | 14.3 | - | - | $30.00 | 8K |
| 30 | Grok 4🇺🇸 xAI | 54.8 | 66.7 | 16.0 | - | 82.7 | - | 52.6 | $3.00 | 256K |
| 31 | GPT-5🇺🇸 OpenAI | 54.4 | 65.7 | 9.9 | - | 81.6 | 21.6 | 48.0 | $1.25 | 400K |
| 32 | GPT-5.2🇺🇸 OpenAI | 54.0 | 86.2 | 52.9 | - | 88.5 | 24.2 | 35.0 | $1.75 | 400K |
| 33 | Gemini 2.0 Pro🇺🇸 Google DeepMind | 53.7 | - | - | - | 54.2 | - | - | N/A | 0K |
| 34 | U Nemotron-4 15B Unknown | 53.4 | - | - | 44.9 | - | - | - | N/A | 0K |
| 35 | Kimi K2 Thinking🇨🇳 moonshotaiOpen | 53.3 | - | - | - | 79.0 | - | - | $0.60 | 262K |
| 36 | o4 Mini🇺🇸 OpenAI | 53.2 | 58.7 | 6.1 | - | 72.8 | 13.9 | 26.4 | $1.10 | 200K |
| 37 | Qwen2.5 72B Instruct🇨🇳 Alibaba QwenOpen | 53.2 | - | - | 73.1 | 32.2 | - | - | $0.12 | 33K |
| 38 | Qwen2.5 Coder 32B Instruct🇨🇳 Alibaba QwenOpen | 53.1 | - | - | - | - | - | - | $0.66 | 33K |
| 39 | DeepSeek V3.2🇨🇳 DeepSeekOpen | 53.0 | 57.0 | 4.0 | - | 77.9 | - | - | $0.26 | 164K |
| 40 | Kimi K2.5🇨🇳 moonshotaiOpen | 52.0 | 65.3 | 11.8 | - | 83.5 | 20.6 | 36.2 | $0.38 | 262K |
| 41 | DeepSeek V3.1🇨🇳 DeepSeekOpen | 51.1 | - | - | - | - | - | 28.0 | $0.15 | 33K |
| 42 | GPT-4o (2024-05-13)🇺🇸 OpenAI | 51.1 | - | - | - | 31.9 | - | - | $5.00 | 128K |
| 43 | GPT-4 Turbo🇺🇸 OpenAI | 51.0 | - | - | 66.8 | 7.5 | - | 10.1 | $10.00 | 128K |
| 44 | GLM 4.7🇨🇳 z-aiOpen | 50.5 | - | - | - | 77.8 | - | 37.2 | $0.39 | 203K |
| 45 | Grok 4 Fast🇺🇸 xAI | 50.4 | 48.5 | 5.3 | - | - | - | - | $0.20 | 2.0M |
| 46 | GPT-5.1🇺🇸 OpenAI | 49.6 | 72.8 | 17.6 | - | 83.5 | 19.8 | 43.8 | $1.25 | 400K |
| 47 | Gemini 3 Flash Preview🇺🇸 Google DeepMind | 49.1 | 21.5 | 33.6 | - | 77.6 | - | 53.3 | $0.50 | 1.0M |
| 48 | Qwen3 235B A22B Instruct 2507🇨🇳 Alibaba QwenOpen | 48.5 | 11.0 | 1.3 | - | - | - | - | $0.07 | 262K |
| 49 | Gemini 2.0 Flash🇺🇸 Google DeepMind | 48.0 | - | 1.3 | - | 52.2 | - | 17.3 | $0.10 | 1.0M |
| 50 | U Stable Beluga 2 Unknown | 47.8 | - | - | 59.1 | - | - | - | N/A | 0K |
| 51 | Claude 3.7 Sonnet🇺🇸 Anthropic | 47.7 | 28.6 | 0.9 | - | 73.0 | 3.4 | 35.7 | $3.00 | 200K |
| 52 | Claude Sonnet 4.6🇺🇸 Anthropic | 47.6 | 86.5 | 60.4 | - | 83.2 | - | - | $3.00 | 1.0M |
| 53 | Gemini 1.5 Flash (May 2024)🇺🇸 Google DeepMind | 47.4 | - | - | - | 20.5 | - | - | N/A | 0K |
| 54 | gpt-oss-120b🇺🇸 OpenAIOpen | 46.9 | - | - | - | 67.7 | - | 6.5 | $0.04 | 131K |
| 55 | Grok 3 Mini🇺🇸 xAI | 46.6 | 16.5 | 0.4 | - | 68.3 | - | - | $0.30 | 131K |
| 56 | GPT-3.5 Turbo (older v0613)🇺🇸 OpenAI | 45.8 | - | - | 48.8 | 2.9 | - | - | $1.00 | 4K |
| 57 | Mistral Large 2411🇫🇷 Mistral AIOpen | 45.8 | - | - | - | 35.1 | - | - | $2.00 | 131K |
| 58 | Claude Opus 4.5🇺🇸 Anthropic | 45.4 | 80.0 | 37.6 | - | 81.4 | 21.4 | 54.4 | $5.00 | 200K |
| 59 | GPT-5 Nano🇺🇸 OpenAI | 45.3 | 20.7 | 2.6 | - | 59.3 | - | - | $0.05 | 400K |
| 60 | R1🇨🇳 DeepSeekOpen | 45.1 | 15.8 | 1.3 | - | 62.3 | - | 17.1 | $0.70 | 64K |
| 61 | Claude Sonnet 4🇺🇸 Anthropic | 44.6 | 40.0 | 5.9 | - | 72.3 | 3.1 | 34.6 | $3.00 | 1.0M |
| 62 | GPT-4.1 Mini🇺🇸 OpenAI | 44.5 | 3.5 | 0.1 | - | 54.5 | - | - | $0.40 | 1.0M |
| 63 | Falcon-180B TIIOpen | 44.4 | - | - | 16.1 | - | - | - | N/A | 0K |
| 64 | Qwen2.5 Coder 7B Instruct🇨🇳 Alibaba QwenOpen | 44.4 | - | - | - | - | - | - | $0.03 | 33K |
| 65 | GPT-4.1🇺🇸 OpenAI | 43.3 | 5.5 | 0.4 | - | 55.9 | 0.6 | 12.4 | $2.00 | 1.0M |
| 66 | GPT-5 Pro🇺🇸 OpenAI | 43.3 | 70.2 | 18.3 | - | - | 28.2 | 53.9 | $15.00 | 400K |
| 67 | GPT-4o-mini (2024-07-18)🇺🇸 OpenAI | 43.2 | - | 0.1 | - | 17.0 | - | - | $0.15 | 128K |
| 68 | Phi 4🇺🇸 MicrosoftOpen | 43.2 | - | - | - | 41.4 | - | - | $0.07 | 16K |
| 69 | Llama 2-13B🇺🇸 MetaOpen | 42.5 | - | - | 44.3 | 1.8 | - | - | N/A | 0K |
| 70 | Claude 3.5 Sonnet🇺🇸 Anthropic | 42.3 | - | - | - | 38.7 | - | 13.0 | N/A | 0K |
| 71 | Gemma 3 27B🇺🇸 Google DeepMindOpen | 42.2 | - | - | - | 31.8 | - | - | $0.08 | 131K |
| 72 | Gemma 3 27B (free)🇺🇸 Google DeepMindOpen | 42.2 | - | - | - | 31.8 | - | - | Free | 131K |
| 73 | Claude Sonnet 4.5🇺🇸 Anthropic | 42.1 | 63.7 | 13.6 | - | 76.4 | 9.4 | 45.2 | $3.00 | 1.0M |
| 74 | Claude Opus 4🇺🇸 Anthropic | 41.7 | 35.7 | 8.6 | - | 68.3 | 6.2 | 50.6 | $15.00 | 200K |
| 75 | Mistral 7B V0.1🇫🇷 Mistral AIOpen | 41.6 | - | - | 41.5 | - | - | - | N/A | 0K |
| 76 | o1-preview🇺🇸 OpenAI | 41.5 | 18.0 | - | - | 33.8 | - | 30.0 | N/A | 0K |
| 77 | Claude Opus 4.1🇺🇸 Anthropic | 41.3 | - | - | - | 69.7 | 7.1 | 52.0 | $15.00 | 200K |
| 78 | Gemini 1.5 Pro (Feb 2024)🇺🇸 Google DeepMind | 41.3 | - | 0.8 | 78.7 | 27.8 | - | 12.5 | N/A | 0K |
| 79 | Qwen2-72B🇨🇳 Alibaba QwenOpen | 41.3 | - | - | - | 21.0 | - | - | N/A | 0K |
| 80 | Qwen2.5-Max🇨🇳 Alibaba QwenOpen | 41.0 | - | - | - | 41.5 | - | - | N/A | 0K |
| 81 | U Baichuan 2-7B Unknown | 40.3 | - | - | 22.1 | - | - | - | N/A | 0K |
| 82 | Gemini 2.5 Flash🇺🇸 Google DeepMind | 40.0 | 32.3 | 2.5 | - | - | 7.7 | 29.4 | $0.30 | 1.0M |
| 83 | Mistral Medium 3🇫🇷 Mistral AIOpen | 40.0 | - | - | - | 46.0 | - | - | $0.40 | 131K |
| 84 | GPT-4o-mini🇺🇸 OpenAI | 39.6 | - | 0.1 | - | 17.0 | - | - | $0.15 | 128K |
| 85 | Mistral Large 2407🇫🇷 Mistral AIOpen | 39.1 | - | - | - | 32.0 | - | 7.0 | $2.00 | 131K |
| 86 | Qwen2.5 Coder 1.5B Instruct🇨🇳 AlibabaOpen | 38.8 | - | - | - | - | - | - | N/A | 0K |
| 87 | Grok 3🇺🇸 xAI | 38.4 | 5.5 | 0.1 | - | 67.7 | - | 23.3 | $3.00 | 131K |
| 88 | o3 Mini🇺🇸 OpenAI | 38.4 | 34.5 | 3.0 | - | 69.4 | - | 7.4 | $1.10 | 200K |
| 89 | Llama 3.1 405B🇺🇸 MetaOpen | 38.0 | - | - | 77.2 | 34.5 | - | 7.6 | N/A | 0K |
| 90 | Llama 3.1 70B Instruct🇺🇸 MetaOpen | 37.8 | - | - | - | 25.6 | - | - | $0.40 | 131K |
| 91 | Gemini 2.0 Flash Thinking (Jan 2025)🇺🇸 Google DeepMind | 37.7 | - | - | - | 42.8 | 1.9 | 16.8 | N/A | 0K |
| 92 | GPT-4o (2024-11-20)🇺🇸 OpenAI | 37.7 | 4.5 | 0.1 | - | 32.3 | - | 1.4 | $2.50 | 128K |
| 93 | Claude 2🇺🇸 Anthropic | 37.2 | - | - | - | 12.9 | - | - | N/A | 0K |
| 94 | Claude 3.5 Haiku🇺🇸 Anthropic | 37.2 | - | - | - | 17.5 | - | - | $0.80 | 200K |
| 95 | Mistral Nemo🇫🇷 Mistral AIOpen | 37.2 | - | - | - | 6.5 | - | - | $0.02 | 131K |
| 96 | Claude Haiku 4.5🇺🇸 Anthropic | 37.1 | 47.7 | 4.0 | - | 61.6 | - | - | $1.00 | 200K |
| 97 | Llama 3.2 90B🇺🇸 MetaOpen | 36.1 | - | - | - | 21.4 | - | - | N/A | 0K |
| 98 | Gemma 2 9B🇺🇸 Google DeepMindOpen | 36.0 | - | - | - | 3.3 | - | - | $0.03 | 8K |
| 99 | GPT-4.5🇺🇸 OpenAI | 35.9 | 10.3 | 0.8 | - | 58.3 | 0.7 | 21.4 | N/A | 0K |
| 100 | GPT-4o (2024-08-06)🇺🇸 OpenAI | 35.6 | - | - | - | 32.3 | - | 1.4 | $2.50 | 128K |
| 101 | GPT-4.1 Nano🇺🇸 OpenAI | 35.2 | 0.1 | 0.1 | - | 31.9 | - | - | $0.10 | 1.0M |
| 102 | LLaMA-13B🇺🇸 MetaOpen | 34.9 | - | - | 17.2 | - | - | - | N/A | 0K |
| 103 | o1-mini🇺🇸 OpenAI | 34.9 | 14.0 | 0.8 | - | 49.8 | - | 1.7 | N/A | 0K |
| 104 | U XGen-7B Unknown | 33.9 | - | - | - | - | - | - | N/A | 0K |
| 105 | Claude 3 Opus🇺🇸 Anthropic | 33.7 | - | - | - | 29.6 | - | 8.2 | N/A | 0K |
| 106 | Grok-2 (Dec 2024)🇺🇸 xAI | 33.2 | - | - | - | 38.4 | - | 7.2 | N/A | 0K |
| 107 | Gemma 2 27B🇺🇸 Google DeepMindOpen | 32.9 | - | - | - | 15.3 | - | - | $0.65 | 8K |
| 108 | Llama 3 70B Instruct🇺🇸 MetaOpen | 32.4 | - | - | - | 20.8 | - | - | $0.51 | 8K |
| 109 | U MPT-30B Unknown | 31.7 | - | - | 17.3 | - | - | - | N/A | 0K |
| 110 | U Yi 6B UnknownOpen | 31.4 | - | - | 29.6 | - | - | - | N/A | 0K |
| 111 | Llama 3 8B Instruct🇺🇸 MetaOpen | 30.8 | - | - | - | 1.4 | - | - | $0.03 | 8K |
| 112 | Phi 2🇺🇸 MicrosoftOpen | 30.2 | - | - | 45.9 | - | - | - | N/A | 0K |
| 113 | Mistral Large🇫🇷 Mistral AIOpen | 30.0 | - | - | - | 18.4 | - | 7.0 | $2.00 | 128K |
| 114 | U Dolly 2.0-12b Unknown | 29.2 | - | - | - | - | - | - | N/A | 0K |
| 115 | Gemma 2B🇺🇸 Google DeepMindOpen | 29.1 | - | - | 13.6 | - | - | - | N/A | 0K |
| 116 | Llama 3.3 70B Instruct (free)🇺🇸 MetaOpen | 29.1 | - | - | - | 29.9 | - | 3.9 | Free | 66K |
| 117 | Claude 3 Haiku🇺🇸 Anthropic | 28.7 | - | - | - | 15.1 | - | - | $0.25 | 200K |
| 118 | Claude 3 Sonnet🇺🇸 Anthropic | 28.3 | - | - | - | 20.8 | - | - | N/A | 0K |
| 119 | Llama 4 Maverick🇺🇸 MetaOpen | 28.0 | 4.4 | 0.1 | - | 56.0 | 0.9 | 13.2 | $0.15 | 1.0M |
| 120 | Llama 3.1 8B Instruct🇺🇸 MetaOpen | 27.4 | - | - | - | 1.3 | - | - | $0.02 | 16K |
| 121 | DeepSeek Coder 33B🇨🇳 DeepSeekOpen | 25.4 | - | - | - | - | - | - | N/A | 0K |
| 122 | U StarCoder 2 15B UnknownOpen | 24.3 | - | - | - | - | - | - | N/A | 0K |
| 123 | U Baichuan1-7B Unknown | 23.7 | - | - | 10.0 | - | - | - | N/A | 0K |
| 124 | Mixtral 8x22B Instruct🇫🇷 Mistral AIOpen | 23.5 | - | - | - | 12.1 | - | - | $2.00 | 66K |
| 125 | Cerebras-GPT-13B🇺🇸 OpenAI | 23.4 | - | - | - | - | - | - | N/A | 0K |
| 126 | Gemini 1.0 Pro🇺🇸 Google DeepMind | 21.1 | - | - | - | 11.9 | - | - | N/A | 0K |
| 127 | Claude 2.1🇺🇸 Anthropic | 21.0 | - | - | - | 10.6 | - | - | N/A | 0K |
| 128 | U INTELLECT-1 Unknown | 20.2 | - | - | 13.1 | - | - | - | N/A | 0K |
| 129 | Llama 4 Scout🇺🇸 MetaOpen | 18.9 | 0.5 | 0.1 | - | 35.8 | - | - | $0.08 | 328K |
| 130 | DeepSeek Coder 6.7B🇨🇳 DeepSeekOpen | 16.7 | - | - | - | - | - | - | N/A | 0K |
| 131 | U Magistral Small 1.1 Unknown | 16.6 | 5.0 | 0.1 | - | 31.2 | - | - | N/A | 0K |
| 132 | Phi-1.5🇺🇸 MicrosoftOpen | 16.3 | - | - | - | - | - | - | N/A | 0K |
| 133 | DeepSeek Coder 1.3B🇨🇳 DeepSeekOpen | 3.2 | - | - | - | - | - | - | N/A | 0K |