Home/Models/MiMo-V2-Omni
xiaomi logo

MiMo-V2-Omni

by xiaomi · Released Mar 2026

Multimodal
Compare
Context
262K tokens (~131 books)
Input $/1M
$0.40
Output $/1M
$2.00
Type
multimodal
License
Proprietary
Benchmarks
3 tested
Data updated today
About

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Tested on 3 benchmarks with 0.0% average. Top scores: Artificial Analysis — Agentic Index (58.6%), Artificial Analysis — Quality Index (43.4%), Artificial Analysis — Coding Index (35.5%).

Capabilities
speed
76.3
#18 globally
Benchmark Scores
Compare All
Tested on 3 benchmarks · Ranked across 1 categories
Score Distribution (all 233 models)
0255075100
Artificial Analysis — Agentic Index

Artificial Analysis Agentic Index. Composite score measuring agent capability across tool use and planning tasks.

58.6
Artificial Analysis — Quality Index

Artificial Analysis Quality Index. Composite quality score combining multiple benchmark results into a single metric.

43.4
Artificial Analysis — Coding Index

Artificial Analysis Coding Index. Composite coding quality score from multiple code benchmarks.

35.5
Excellent (85+) Good (70-85) Average (50-70) Below (<50)
Links
Documentation
Community
BenchGecko API
mimo-v2-omni
Specifications
  • Typemultimodal
  • Context262K tokens (~131 books)
  • ReleasedMar 2026
  • LicenseProprietary
  • StatusActive
  • Cost / Message~$0.003
Available On
xiaomi logoxiaomi$0.40
Categories
Share & Export
Tweet
MiMo-V2-Omni is a proprietary multimodal AI model by xiaomi, released in March 2026. Context window: 262K tokens.