Context
262K tokens (~131 books)
Input $/1M
$0.40
Output $/1M
$2.00
Type
multimodal
License
Proprietary
Benchmarks
3 tested
Data updated today
About
MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...
Tested on 3 benchmarks with 0.0% average. Top scores: Artificial Analysis — Agentic Index (58.6%), Artificial Analysis — Quality Index (43.4%), Artificial Analysis — Coding Index (35.5%).
Capabilities
speed
76.3
#18 globally
Benchmark Scores
Compare AllTested on 3 benchmarks · Ranked across 1 categories
Score Distribution (all 233 models)
0255075100
speedCompare speed →
Artificial Analysis — Agentic Index
58.6—Artificial Analysis Agentic Index. Composite score measuring agent capability across tool use and planning tasks.
Artificial Analysis — Quality Index
43.4—Artificial Analysis Quality Index. Composite quality score combining multiple benchmark results into a single metric.
Artificial Analysis — Coding Index
35.5—Artificial Analysis Coding Index. Composite coding quality score from multiple code benchmarks.
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Research
Documentation
Community
BenchGecko API
mimo-v2-omni
Specifications
- Typemultimodal
- Context262K tokens (~131 books)
- ReleasedMar 2026
- LicenseProprietary
- StatusActive
- Cost / Message~$0.003
Available On
Categories
Learn More
Share & Export
Frequently Asked Questions
MiMo-V2-Omni is a proprietary multimodal AI model by xiaomi, released in March 2026. Context window: 262K tokens.