SWE-agent 1.0
by SWE-agent
33.8
best score
33.8%
Best Score
test
Best Leaderboard
1
Models Used
Yes
Open Source
Score History
| Entry | Score |
|---|---|
| mini-SWE-agent + Gemini 3 Pro | 69.6% |
| mini-SWE-agent + GPT-5-2 Codex | 72.8% |
| mini-SWE-agent + Claude 4.5 Opus (high reasoning) | 76.8% |
| mini-SWE-agent + Gemini 3 Flash (high reasoning) | 75.8% |
| mini-SWE-agent + MiniMax M2.5 (high reasoning) | 75.8% |
| mini-SWE-agent + Claude Opus 4.6 | 75.6% |
| mini-SWE-agent + GLM-5 (high reasoning) | 72.8% |
| mini-SWE-agent + GPT-5-2 (high reasoning) | 72.8% |
| mini-SWE-agent + Claude 4.5 Sonnet (high reasoning) | 71.4% |
| mini-SWE-agent + Kimi K2.5 (high reasoning) | 70.8% |
| mini-SWE-agent + DeepSeek V3.2 (high reasoning) | 70.0% |
| mini-SWE-agent + Claude 4.5 Haiku (high reasoning) | 66.6% |
| mini-SWE-agent + GPT-5 Mini | 56.2% |
| mini-SWE-agent + GPT-5.2 (2025-12-11) (high reasoning) | 71.8% |
| mini-SWE-agent + GPT-5.2 (2025-12-11) | 69.0% |
| mini-SWE-agent + Kimi K2 Thinking | 63.4% |
| mini-SWE-agent + Devstral small (2512) | 56.4% |
| mini-SWE-agent + Devstral (2512) | 53.8% |
| mini-SWE-agent + DeepSeek V3.2 Reasoner | 60.0% |
| mini-SWE-agent + GLM-4.6 (T=1) | 55.4% |
| mini-SWE-agent + Claude 4.5 Opus medium (20251101) | 74.4% |
| mini-SWE-agent + GPT-5.1-codex (medium reasoning) | 66.0% |
| mini-SWE-agent + Minimax M2 | 61.0% |
| mini-SWE-agent + GPT-5.1 (2025-11-13) (medium reasoning) | 66.0% |
| mini-SWE-agent + Gemini 3 Pro Preview (2025-11-18) | 74.2% |
| mini-SWE-agent + Claude 4.5 Sonnet (20250929) | 70.6% |
| mini-SWE-agent + GLM-4.5 (2025-08-22) | 54.2% |
| mini-SWE-agent + GPT-5 (2025-08-07) (medium reasoning) | 65.0% |
| mini-SWE-agent + GPT-5 mini (2025-08-07) (medium reasoning) | 59.8% |
| mini-SWE-agent + Kimi K2 Instruct | 43.8% |
| mini-SWE-agent + GPT-5 nano (2025-08-07) (medium reasoning) | 34.8% |
| mini-SWE-agent + gpt-oss-120b | 26.0% |
| mini-SWE-agent + Qwen2.5-Coder 32B Instruct | 9.0% |
| mini-SWE-agent + Claude 4 Opus (20250514) | 67.6% |
| mini-SWE-agent + Qwen3-Coder 480B/A35B Instruct | 55.4% |
| mini-SWE-agent + Claude 4 Sonnet (20250514) | 64.9% |
| mini-SWE-agent + o3 (2025-04-16) | 58.4% |
| mini-SWE-agent + Gemini 2.5 Pro (2025-05-06) | 53.6% |
| mini-SWE-agent + o4-mini (2025-04-16) | 45.0% |
| mini-SWE-agent + GPT-4.1 (2025-04-14) | 39.6% |
| mini-SWE-agent + Gemini 2.5 Flash (2025-04-17) | 28.7% |
| mini-SWE-agent + Gemini 2.0 flash | 13.5% |
| SWE-agent + DevStral Small 2507 | 38.0% |
| mini-SWE-agent + Claude 3.7 Sonnet (20250219) | 52.8% |
| mini-SWE-agent + GPT-4.1-mini (2025-04-14) | 23.9% |
| mini-SWE-agent + GPT-4o (2024-11-20) | 21.6% |
| mini-SWE-agent + Llama 4 Maverick Instruct | 21.0% |
| mini-SWE-agent + Llama 4 Scout Instruct | 9.1% |
| SWE-agent + Claude 4 Sonnet | 56.7% |
| SWE-agent + Claude 4 Sonnet | 66.6% |
| SWE-agent + SWE-agent-LM-32B | 40.2% |
| SWE-agent 1.0 (Claude 3.7 Sonnet) | 33.8% |
| SWE-agent + Claude 3.7 Sonnet | 48.0% |
| SWE-agent + Claude 3.7 Sonnet w/ Review Heavy | 62.4% |
| SWE-agent Multimodal + GPT 4o (2024-08-06) | 12.2% |
| SWE-agent + Claude Sonnet 3.5 | 12.2% |
| SWE-agent JavaScript + Claude Sonnet 3.5 | 12.0% |
| SWE-agent + GPT 4o (2024-08-06) | 12.0% |
| SWE-agent Multimodal + Claude 3.5 Sonnet | 11.4% |
| SWE-agent JavaScript + GPT 4o (2024-08-06) | 9.3% |
| SWE-agent + GPT 4o (2024-05-13) | 12.0% |
| SWE-agent + GPT 4o (2024-05-13) | 23.2% |
| SWE-agent + GPT 4o (2024-05-13) | 18.3% |
| SWE-agent + Claude 3.5 Sonnet | 18.1% |
| SWE-agent + Claude 3.5 Sonnet | 33.6% |
| SWE-agent + Claude 3.5 Sonnet | 23.0% |
| SWE-agent + GPT 4 (1106) | 12.5% |
| SWE-agent + Claude 3 Opus | 10.5% |
| SWE-agent + GPT 4 (1106) | 22.4% |
| SWE-agent + Claude 3 Opus | 15.8% |
| SWE-agent + GPT 4 (1106) | 18.0% |
| SWE-agent + Claude 3 Opus | 11.7% |
| RAG + SWE-Llama 13B | 0.7% |
| RAG + SWE-Llama 7B | 0.7% |
| RAG + SWE-Llama 7B | 1.4% |
| RAG + SWE-Llama 13B | 1.2% |
| RAG + SWE-Llama 7B | 1.3% |
| RAG + SWE-Llama 13B | 1.0% |