o3 vs gpt-oss-120b
Lado a lado. Cada métrica. Cada benchmark.
| Tipo | o3 | gpt-oss-120b |
|---|---|---|
| Provider | ||
| pontuação média | 55.2 | 46.9 |
| Preço de entrada | $2.00 | $0.04 |
| Preço de saída | $8.00 | $0.19 |
| Janela de contexto | 200K tokens (~100 books) | 131K tokens (~66 books) |
| Lançado em | 2025-04-16 | 2025-08-05 |
| Código aberto | Proprietary | Open Source |
Pontuações de benchmark
14 benchmarks · o3: 13, gpt-oss-120b: 1
| Benchmark | Categoria | o3 | gpt-oss-120b |
|---|---|---|---|
| Aider polyglot | coding | 81.3 | 41.8 |
| Fiction.LiveBench | knowledge | 88.9 | 44.4 |
| GPQA diamond | knowledge | 75.8 | 67.7 |
| HELM — GPQA | knowledge | 75.3 | 68.4 |
| HELM — IFEval | language | 86.9 | 83.6 |
| HELM — MMLU-Pro | knowledge | 85.9 | 79.5 |
| HELM — Omni-MATH | math | 71.4 | 68.8 |
| HELM — WildBench | reasoning | 86.1 | 84.5 |
| Lech Mazur Writing | knowledge | 83.9 | 77.3 |
| OTIS Mock AIME 2024-2025 | math | 83.9 | 88.9 |
| SimpleBench | reasoning | 43.7 | 6.5 |
| SimpleQA Verified | knowledge | 53.0 | 13.9 |
| SWE-Bench Verified (Bash Only) | coding | 58.4 | 26.0 |
| WeirdML | coding | 52.4 | 48.2 |