API

Changelog

Every change in the AI economy β€” tracked.

OpenAI released GPT-5.4 Nano β€” a lightweight model targeting edge deployments with a 128K context window at $0.10/$0.40 per 1M tokens.

New ModelHigh Impactopenai

OpenAI shipped GPT-5.4 Mini alongside the Nano variant β€” a mid-tier option with stronger reasoning at $0.50/$2.00 per 1M tokens.

New ModelMedium Impactopenai

Anthropic cut Claude Opus 4.5 input pricing from $8.00 to $5.00 per 1M tokens, matching the Opus 4.6 price point.

$8.00/M input→$5.00/M input
Price DropHigh Impactanthropic

Mistral AI released Mistral Small 4 β€” a compact model at $0.15/$0.60 per 1M tokens with strong multilingual performance.

New ModelMedium Impactmistral

Google DeepMind's Gemini 2.5 Pro posted a new MMLU score of 94.1%, moving into second place behind GPT-5.4 Pro.

92.8%β†’94.1%
Score UpdateHigh Impactgoogle

xAI launched the multi-agent variant of Grok 4.20 β€” the first production multi-agent system tracked on BenchGecko.

New AgentHigh Impactxai

DeepSeek reduced V3.2 output pricing from $0.55 to $0.38 per 1M tokens, undercutting most mid-tier competitors.

$0.55/M output→$0.38/M output
Price DropMedium Impactdeepseek

The MCP registry grew by 7 servers this week β€” 4 in dev-tools, 2 in database, and 1 in cloud. Quality scores range from 55 to 72.

MCP ServerLow Impact

Anthropic launched Claude Sonnet 4.6 at $3/$15 per 1M tokens β€” positioned as the default coding and analysis model in the Claude lineup.

New ModelHigh Impactanthropic

Anthropic's new flagship β€” Claude Opus 4.6 β€” ships at $5/$25 per 1M tokens with extended thinking and improved agentic capabilities.

New ModelHigh Impactanthropic

BenchGecko now tracks the OTIS Mock AIME 2024-2025 benchmark β€” a math reasoning evaluation with 14 models scored so far.

New BenchmarkMedium Impact

Anthropic raised Claude Opus 4.1 pricing from $12/$60 to $15/$75 per 1M tokens, reflecting its position as the legacy premium tier.

$12/$60 per 1M→$15/$75 per 1M
Price IncreaseMedium Impactanthropic

xAI released Grok 4.20 Beta at $2/$6 per 1M tokens β€” a major upgrade with improved code generation and multi-step reasoning.

New ModelHigh Impactxai

Inception joined BenchGecko with Mercury 2 β€” their first model available via OpenRouter at competitive pricing.

New ProviderMedium Impactinception

DeepSeek's R1 0528 model scored 87.2% on the GPQA Diamond benchmark, the highest score among open-weight models.

87.2%
Score UpdateMedium Impactdeepseek

Three new AI/ML-focused MCP servers were registered β€” including integrations for model monitoring and prompt management.

MCP ServerLow Impact

OpenAI deprecated the GPT-4o Audio Preview endpoint. Existing integrations will continue for 90 days before shutdown.

DeprecatedMedium Impactopenai

Mistral AI reduced Medium 3.1 input pricing from $0.60 to $0.40 per 1M tokens β€” now the cheapest medium-tier model from Mistral.

$0.60/M input→$0.40/M input
Price DropLow Impactmistral

DeepSeek launched V3.2 Speciale β€” a fine-tuned variant optimized for long-context tasks at $0.40/$1.20 per 1M tokens.

New ModelMedium Impactdeepseek

WeirdML β€” a creative reasoning benchmark testing unusual pattern matching β€” is now tracked with 8 models scored.

New BenchmarkLow Impact

NVIDIA shipped Nemotron 3 Super β€” a 120B parameter MoE model (12B active) at $0.10/$0.40 per 1M tokens. A free variant is also available.

New ModelMedium Impactnvidia

Google DeepMind confirmed Flash Lite pricing β€” the cheapest Gemini model to date, targeting high-volume production workloads.

$0.10/$0.40 per 1M
Price DropMedium Impactgoogle

Mistral AI shipped their flagship Large 3 model (December 2025 checkpoint) at $0.50/$1.50 per 1M tokens with 128K context.

New ModelMedium Impactmistral

xAI's dedicated coding agent β€” Grok Code Fast 1 β€” entered the SWE-bench leaderboard at $0.20/$1.50 per 1M tokens.

New AgentMedium Impactxai

Anthropic's Claude Sonnet 4.5 achieved 91.7% on MMLU β€” a strong result for a mid-tier model, surpassing several flagship competitors.

91.7%
Score UpdateMedium Impactanthropic

Largest weekly MCP registry growth β€” 12 servers covering search, finance, communication, database, and dev-tools categories.

MCP ServerLow Impact

OpenAI released GPT-5.4 Pro β€” their most capable model yet, targeting enterprise and research use cases at premium pricing.

New ModelHigh Impactopenai

Alongside the Pro variant, OpenAI shipped the standard GPT-5.4 β€” positioned as the successor to GPT-5.3 Chat for general use.

New ModelHigh Impactopenai

xAI deprecated Grok 3 Mini β€” users are directed to Grok 4.1 Fast as the recommended replacement at $0.20/$0.50 per 1M tokens.

DeprecatedLow Impactxai

NVIDIA cut Nemotron Super 49B pricing to $0.10/$0.40 per 1M tokens β€” making it the cheapest 49B-class model available.

$0.20/$0.80 per 1M→$0.10/$0.40 per 1M
Price DropLow Impactnvidia

Liquid joined BenchGecko with their LFM2-24B model β€” a 24B parameter mixture-of-experts architecture.

New ProviderLow Impactliquid

MiniMax shipped M2.7 β€” a competitive mid-tier model with strong multilingual benchmarks and 128K context.

New ModelLow Impactminimax

xAI's Grok 4 achieved 89.4% on the GPQA Diamond benchmark β€” a new high for the Grok family.

89.4%
Score UpdateMedium Impactxai

BenchGecko added LAMBADA β€” a language modeling benchmark measuring word prediction in long-range contexts β€” with 22 models scored.

New BenchmarkLow Impact

Google DeepMind reduced Gemini 2.5 Flash output pricing from $3.50 to $2.50 per 1M tokens, improving its cost-performance ratio.

$3.50/M output→$2.50/M output
Price DropMedium Impactgoogle

Inception released Mercury 2 β€” their second-generation model with improved reasoning capabilities, available via OpenRouter.

New ModelLow Impactinception

Alibaba shipped Qwen3.5-Flash β€” a lightweight model targeting fast inference at competitive pricing for the Asian market.

New ModelMedium Impactalibaba

Five new MCP servers registered this week, with a focus on financial data integrations and authentication providers.

MCP ServerLow Impact