Compare · ModelsLive · 2 picked · head to head
Gemini 2.5 Pro vs gpt-oss-120b (free)
Side by side · benchmarks, pricing, and signals you can act on.
Winner summary
Gemini 2.5 Pro wins on 6/10 benchmarks
Gemini 2.5 Pro wins 6 of 10 shared benchmarks. Leads in speed · coding · knowledge.
Category leads
speed·Gemini 2.5 Procoding·Gemini 2.5 Promath·gpt-oss-120b (free)knowledge·Gemini 2.5 Prolanguage·gpt-oss-120b (free)
Hype vs Reality
Attention vs performance
Gemini 2.5 Pro
#61 by perf·no signal
gpt-oss-120b (free)
#22 by perf·no signal
Vendor risk
Who is behind the model
Google DeepMind
$4.00T·Tier 1
OpenAI
$840.0B·Tier 1
Head to head
10 benchmarks · 2 models
Gemini 2.5 Progpt-oss-120b (free)
Artificial Analysis · Agentic Index
gpt-oss-120b (free) leads by +5.2
Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?"
Gemini 2.5 Pro
32.7
gpt-oss-120b (free)
37.9
Artificial Analysis · Coding Index
Gemini 2.5 Pro leads by +3.3
Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads.
Gemini 2.5 Pro
31.9
gpt-oss-120b (free)
28.6
Artificial Analysis · Quality Index
Gemini 2.5 Pro leads by +1.4
Gemini 2.5 Pro
34.6
gpt-oss-120b (free)
33.3
Aider polyglot
Gemini 2.5 Pro leads by +41.3
Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework.
Gemini 2.5 Pro
83.1
gpt-oss-120b (free)
41.8
OpenCompass · AIME2025
gpt-oss-120b (free) leads by +4.7
Gemini 2.5 Pro
88.7
gpt-oss-120b (free)
93.4
OpenCompass · GPQA-Diamond
Gemini 2.5 Pro leads by +5.8
Gemini 2.5 Pro
84.7
gpt-oss-120b (free)
78.9
OpenCompass · HLE
Gemini 2.5 Pro leads by +2.8
Gemini 2.5 Pro
21.1
gpt-oss-120b (free)
18.3
OpenCompass · IFEval
gpt-oss-120b (free) leads by +0.2
Gemini 2.5 Pro
90.0
gpt-oss-120b (free)
90.2
OpenCompass · LiveCodeBenchV6
gpt-oss-120b (free) leads by +7.1
Gemini 2.5 Pro
71.3
gpt-oss-120b (free)
78.4
OpenCompass · MMLU-Pro
Gemini 2.5 Pro leads by +6.1
Gemini 2.5 Pro
85.8
gpt-oss-120b (free)
79.7
Full benchmark table
| Benchmark | Gemini 2.5 Pro | gpt-oss-120b (free) |
|---|---|---|
Artificial Analysis · Agentic Index Artificial Analysis Agentic Index · a composite score measuring how well a model performs in agentic workflows · multi-step tool use, planning, error recovery, and autonomous task completion. Aggregates results from multiple agentic benchmarks including SWE-bench, tool-use tests, and planning evaluations. The canonical single-number metric for "how good is this model as an agent?" | 32.7 | 37.9 |
Artificial Analysis · Coding Index Artificial Analysis Coding Index · a composite score that aggregates performance across multiple coding benchmarks into a single index. Tracks code generation quality, debugging ability, multi-language competence, and real-world software engineering tasks. Used by Artificial Analysis to rank model coding capability in a normalized, comparable format. Useful for developers choosing between models for coding-heavy workloads. | 31.9 | 28.6 |
Artificial Analysis · Quality Index | 34.6 | 33.3 |
Aider polyglot Aider Polyglot · measures how well AI models can edit code across multiple programming languages using the Aider coding assistant framework. | 83.1 | 41.8 |
OpenCompass · AIME2025 | 88.7 | 93.4 |
OpenCompass · GPQA-Diamond | 84.7 | 78.9 |
OpenCompass · HLE | 21.1 | 18.3 |
OpenCompass · IFEval | 90.0 | 90.2 |
OpenCompass · LiveCodeBenchV6 | 71.3 | 78.4 |
OpenCompass · MMLU-Pro | 85.8 | 79.7 |
Pricing · per 1M tokens · projected $/mo at 10M tokens
| Model | Input | Output | Context | Projected $/mo |
|---|---|---|---|---|
| $1.25 | $10.00 | 1.0M tokens (~524 books) | $34.38 | |
| $0.00 | $0.00 | 131K tokens (~66 books) | — |