Microsoft Maia 200
Microsoft Maia 200 is the next-gen custom AI chip · succeeds Maia 100 · targets Azure OpenAI inference + training at rack scale.
Microsoft Maia 200 is the next-gen custom AI chip · succeeds Maia 100 · targets Azure OpenAI inference + training at rack scale.
Basic
Microsoft Maia is Azure's custom AI silicon. Maia 100 shipped late 2024 serving select Azure OpenAI workloads. Maia 200 (announced Ignite 2025) doubles HBM and scales to multi-rack interconnects. Targets GPT-4o-class inference and smaller training runs. Not publicly benchmarked but Azure claims per-token cost below H100 Bedrock equivalents.
Deep
Maia architecture: custom cores designed with OpenAI feedback, focused on large context (100K+) serving and reasoning workloads. Maia 200 specs leaked: HBM3e, 5nm, rack-scale via custom Cobalt-like interconnect. Software stack is an evolution of Microsoft's internal AI compiler (likely Triton-based). Primarily deployed inside Azure OpenAI · customers see it via lower prices on specific endpoints.
Expert
Microsoft's Maia + Cobalt strategy: Maia for AI, Cobalt for general-purpose ARM compute. Both reduce NVIDIA dependency for Azure. Maia 200 reportedly gets priority silicon at TSMC · evidence of Microsoft commitment. OpenAI influence on design means Maia is optimized for transformer inference patterns from GPT-series: large context + long generation + reasoning. Not a generalist AI chip like H200; a specialized LLM-inference chip.
Depending on why you're here
- ·Microsoft's 2nd-gen AI ASIC · HBM3e · 5nm
- ·Designed with OpenAI input · transformer-specialized
- ·Rack-scale interconnect (Cobalt-adjacent)
- ·Access via Azure OpenAI only
- ·Lower cost on specific endpoints
- ·Not directly programmable
- ·Microsoft's vertical AI play · reduces NVIDIA exposure
- ·Strategic lock with OpenAI via hardware co-design
- ·Azure margin protection
- ·Microsoft's own AI chip for Azure
- ·Makes OpenAI models cheaper to run on Azure
- ·Not sold to customers directly
Maia 200 is Microsoft's hardware bet on OpenAI · architecture co-design with GPT series makes it hard for competitors to replicate.