gpt-oss-120b is an open-weight, 117B-parameter Mixture-of-Experts (MoE) language model from OpenAI designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized...
Tested on 10 benchmarks with 68.7% average. Top scores: OpenCompass — AIME2025 (93.4%), OpenCompass — IFEval (90.2%), OpenCompass — MMLU-Pro (79.7%).
OpenCompass Live Code Bench v6. Fresh competitive programming problems to evaluate code generation without memorization.
Multi-language code editing from Aider. Tests editing ability across Python, JavaScript, TypeScript, Java, C++, Go, Rust, and more.
OpenCompass evaluation on AIME 2025 problems. Tests mathematical reasoning on fresh competition problems.
OpenCompass MMLU-Pro evaluation. Harder knowledge test with more answer choices.
OpenCompass evaluation of GPQA Diamond. PhD-level science questions from the hardest subset.
OpenCompass evaluation of Humanitys Last Exam. Expert-level cross-discipline knowledge test.
- Typetext
- Context131K tokens (~66 books)
- ReleasedAug 2025
- LicenseOpen Source
- StatusActive
- Cost / Message~$0.000