Changelog
Every change in the AI economy β tracked.
March 31, 2026
GPT-5.4 Nano launched on OpenAI
OpenAI released GPT-5.4 Nano β a lightweight model targeting edge deployments with a 128K context window at $0.10/$0.40 per 1M tokens.
GPT-5.4 Mini joins the OpenAI lineup
OpenAI shipped GPT-5.4 Mini alongside the Nano variant β a mid-tier option with stronger reasoning at $0.50/$2.00 per 1M tokens.
March 30, 2026
Claude Opus 4.5 input price dropped β $5.00 per 1M tokens
Anthropic cut Claude Opus 4.5 input pricing from $8.00 to $5.00 per 1M tokens, matching the Opus 4.6 price point.
Mistral Small 4 available via Mistral AI
Mistral AI released Mistral Small 4 β a compact model at $0.15/$0.60 per 1M tokens with strong multilingual performance.
March 29, 2026
Gemini 2.5 Pro scores 94.1% on MMLU
Google DeepMind's Gemini 2.5 Pro posted a new MMLU score of 94.1%, moving into second place behind GPT-5.4 Pro.
Grok 4.20 Multi-Agent Beta enters agent rankings
xAI launched the multi-agent variant of Grok 4.20 β the first production multi-agent system tracked on BenchGecko.
March 28, 2026
DeepSeek V3.2 output price dropped β $0.38 per 1M tokens
DeepSeek reduced V3.2 output pricing from $0.55 to $0.38 per 1M tokens, undercutting most mid-tier competitors.
7 new MCP servers added in dev-tools category
The MCP registry grew by 7 servers this week β 4 in dev-tools, 2 in database, and 1 in cloud. Quality scores range from 55 to 72.
March 27, 2026
Claude Sonnet 4.6 released by Anthropic
Anthropic launched Claude Sonnet 4.6 at $3/$15 per 1M tokens β positioned as the default coding and analysis model in the Claude lineup.
Claude Opus 4.6 released by Anthropic
Anthropic's new flagship β Claude Opus 4.6 β ships at $5/$25 per 1M tokens with extended thinking and improved agentic capabilities.
March 26, 2026
OTIS Mock AIME 2024-2025 benchmark added
BenchGecko now tracks the OTIS Mock AIME 2024-2025 benchmark β a math reasoning evaluation with 14 models scored so far.
Claude Opus 4.1 pricing increased β $15/$75 per 1M tokens
Anthropic raised Claude Opus 4.1 pricing from $12/$60 to $15/$75 per 1M tokens, reflecting its position as the legacy premium tier.
March 25, 2026
Grok 4.20 Beta launched by xAI
xAI released Grok 4.20 Beta at $2/$6 per 1M tokens β a major upgrade with improved code generation and multi-step reasoning.
Inception added as a tracked provider
Inception joined BenchGecko with Mercury 2 β their first model available via OpenRouter at competitive pricing.
March 24, 2026
DeepSeek R1 0528 posted 87.2% on GPQA Diamond
DeepSeek's R1 0528 model scored 87.2% on the GPQA Diamond benchmark, the highest score among open-weight models.
3 new MCP servers in AI/ML category
Three new AI/ML-focused MCP servers were registered β including integrations for model monitoring and prompt management.
March 23, 2026
GPT-4o Audio Preview marked as deprecated
OpenAI deprecated the GPT-4o Audio Preview endpoint. Existing integrations will continue for 90 days before shutdown.
Mistral Medium 3.1 input price cut to $0.40 per 1M tokens
Mistral AI reduced Medium 3.1 input pricing from $0.60 to $0.40 per 1M tokens β now the cheapest medium-tier model from Mistral.
March 22, 2026
DeepSeek V3.2 Speciale released
DeepSeek launched V3.2 Speciale β a fine-tuned variant optimized for long-context tasks at $0.40/$1.20 per 1M tokens.
WeirdML benchmark now tracked on BenchGecko
WeirdML β a creative reasoning benchmark testing unusual pattern matching β is now tracked with 8 models scored.
March 20, 2026
Nemotron 3 Super (120B) launched by NVIDIA
NVIDIA shipped Nemotron 3 Super β a 120B parameter MoE model (12B active) at $0.10/$0.40 per 1M tokens. A free variant is also available.
Gemini 2.5 Flash Lite priced at $0.10/$0.40 per 1M tokens
Google DeepMind confirmed Flash Lite pricing β the cheapest Gemini model to date, targeting high-volume production workloads.
March 18, 2026
Mistral Large 3 2512 released by Mistral AI
Mistral AI shipped their flagship Large 3 model (December 2025 checkpoint) at $0.50/$1.50 per 1M tokens with 128K context.
Grok Code Fast 1 added to agent rankings
xAI's dedicated coding agent β Grok Code Fast 1 β entered the SWE-bench leaderboard at $0.20/$1.50 per 1M tokens.
March 16, 2026
Claude Sonnet 4.5 scores 91.7% on MMLU
Anthropic's Claude Sonnet 4.5 achieved 91.7% on MMLU β a strong result for a mid-tier model, surpassing several flagship competitors.
12 new MCP servers added across 5 categories
Largest weekly MCP registry growth β 12 servers covering search, finance, communication, database, and dev-tools categories.
March 14, 2026
GPT-5.4 Pro launched β OpenAI's new flagship
OpenAI released GPT-5.4 Pro β their most capable model yet, targeting enterprise and research use cases at premium pricing.
GPT-5.4 standard tier released by OpenAI
Alongside the Pro variant, OpenAI shipped the standard GPT-5.4 β positioned as the successor to GPT-5.3 Chat for general use.
March 12, 2026
Grok 3 Mini marked as deprecated by xAI
xAI deprecated Grok 3 Mini β users are directed to Grok 4.1 Fast as the recommended replacement at $0.20/$0.50 per 1M tokens.
Llama 3.3 Nemotron Super 49B pricing dropped
NVIDIA cut Nemotron Super 49B pricing to $0.10/$0.40 per 1M tokens β making it the cheapest 49B-class model available.
March 10, 2026
Liquid added as a tracked provider
Liquid joined BenchGecko with their LFM2-24B model β a 24B parameter mixture-of-experts architecture.
MiniMax M2.7 released by MiniMax
MiniMax shipped M2.7 β a competitive mid-tier model with strong multilingual benchmarks and 128K context.
March 8, 2026
Grok 4 posted 89.4% on GPQA Diamond
xAI's Grok 4 achieved 89.4% on the GPQA Diamond benchmark β a new high for the Grok family.
LAMBADA benchmark scores now tracked
BenchGecko added LAMBADA β a language modeling benchmark measuring word prediction in long-range contexts β with 22 models scored.
March 5, 2026
Gemini 2.5 Flash output price reduced to $2.50 per 1M tokens
Google DeepMind reduced Gemini 2.5 Flash output pricing from $3.50 to $2.50 per 1M tokens, improving its cost-performance ratio.
Mercury 2 launched by Inception
Inception released Mercury 2 β their second-generation model with improved reasoning capabilities, available via OpenRouter.
March 3, 2026
Qwen3.5-Flash released by Alibaba Qwen
Alibaba shipped Qwen3.5-Flash β a lightweight model targeting fast inference at competitive pricing for the Asian market.
5 new MCP servers added β finance and auth categories
Five new MCP servers registered this week, with a focus on financial data integrations and authentication providers.