Changelog

Ogni cambiamento nell'economia dell'IA · sotto controllo.

April 7, 2026

Claude Mythos Preview · Anthropic's most capable model arrives

Anthropic officially released Claude Mythos Preview. Tops every major benchmark · 93.9% SWE-bench Verified, 94.5% GPQA Diamond, 97.6% USAMO, 64.7% HLE with tools. Adaptive thinking at max effort with up to 1M token context.

Nuovo modelloImpatto elevatoanthropic

March 31, 2026

GPT-5.4 Nano launched on OpenAI

OpenAI released GPT-5.4 Nano · a lightweight model targeting edge deployments with a 128K context window at $0.10/$0.40 per 1M tokens.

Nuovo modelloImpatto elevatoopenai

GPT-5.4 Mini joins the OpenAI lineup

OpenAI shipped GPT-5.4 Mini alongside the Nano variant · a mid-tier option with stronger reasoning at $0.50/$2.00 per 1M tokens.

Nuovo modelloImpatto medioopenai

March 30, 2026

Claude Opus 4.5 input price dropped · $5.00 per 1M tokens

Anthropic cut Claude Opus 4.5 input pricing from $8.00 to $5.00 per 1M tokens, matching the Opus 4.6 price point.

$8.00/M input→$5.00/M input

Calo di prezzoImpatto elevatoanthropic

Mistral Small 4 available via Mistral AI

Mistral AI released Mistral Small 4 · a compact model at $0.15/$0.60 per 1M tokens with strong multilingual performance.

Nuovo modelloImpatto mediomistral

March 29, 2026

Gemini 2.5 Pro scores 94.1% on MMLU

Google DeepMind's Gemini 2.5 Pro posted a new MMLU score of 94.1%, moving into second place behind GPT-5.4 Pro.

92.8%→94.1%

Punteggio aggiornatoImpatto elevatogoogle

Grok 4.20 Multi-Agent Beta enters agent rankings

xAI launched the multi-agent variant of Grok 4.20 · the first production multi-agent system tracked on BenchGecko.

Nuovo agenteImpatto elevatoxai

March 28, 2026

DeepSeek V3.2 output price dropped · $0.38 per 1M tokens

DeepSeek reduced V3.2 output pricing from $0.55 to $0.38 per 1M tokens, undercutting most mid-tier competitors.

$0.55/M output→$0.38/M output

Calo di prezzoImpatto mediodeepseek

7 new MCP servers added in dev-tools category

The MCP registry grew by 7 servers this week · 4 in dev-tools, 2 in database, and 1 in cloud. Quality scores range from 55 to 72.

Server MCPImpatto basso

March 27, 2026

Claude Sonnet 4.6 released by Anthropic

Anthropic launched Claude Sonnet 4.6 at $3/$15 per 1M tokens · positioned as the default coding and analysis model in the Claude lineup.

Nuovo modelloImpatto elevatoanthropic

Claude Opus 4.6 released by Anthropic

Anthropic's new flagship · Claude Opus 4.6 · ships at $5/$25 per 1M tokens with extended thinking and improved agentic capabilities.

Nuovo modelloImpatto elevatoanthropic

March 26, 2026

OTIS Mock AIME 2024-2025 benchmark added

BenchGecko now tracks the OTIS Mock AIME 2024-2025 benchmark · a math reasoning evaluation with 14 models scored so far.

Nuovo benchmarkImpatto medio

Claude Opus 4.1 pricing increased · $15/$75 per 1M tokens

Anthropic raised Claude Opus 4.1 pricing from $12/$60 to $15/$75 per 1M tokens, reflecting its position as the legacy premium tier.

$12/$60 per 1M→$15/$75 per 1M

Aumento di prezzoImpatto medioanthropic

March 25, 2026

Grok 4.20 Beta launched by xAI

xAI released Grok 4.20 Beta at $2/$6 per 1M tokens · a major upgrade with improved code generation and multi-step reasoning.

Nuovo modelloImpatto elevatoxai

Inception added as a tracked provider

Inception joined BenchGecko with Mercury 2 · their first model available via OpenRouter at competitive pricing.

Nuovo providerImpatto medioinception

March 24, 2026

DeepSeek R1 0528 posted 87.2% on GPQA Diamond

DeepSeek's R1 0528 model scored 87.2% on the GPQA Diamond benchmark, the highest score among open-weight models.

87.2%

Punteggio aggiornatoImpatto mediodeepseek

3 new MCP servers in AI/ML category

Three new AI/ML-focused MCP servers were registered · including integrations for model monitoring and prompt management.

Server MCPImpatto basso

March 23, 2026

GPT-4o Audio Preview marked as deprecated

OpenAI deprecated the GPT-4o Audio Preview endpoint. Existing integrations will continue for 90 days before shutdown.

DeprecatoImpatto medioopenai

Mistral Medium 3.1 input price cut to $0.40 per 1M tokens

Mistral AI reduced Medium 3.1 input pricing from $0.60 to $0.40 per 1M tokens · now the cheapest medium-tier model from Mistral.

$0.60/M input→$0.40/M input

Calo di prezzoImpatto bassomistral

March 22, 2026

DeepSeek V3.2 Speciale released

DeepSeek launched V3.2 Speciale · a fine-tuned variant optimized for long-context tasks at $0.40/$1.20 per 1M tokens.

Nuovo modelloImpatto mediodeepseek

WeirdML benchmark now tracked on BenchGecko

WeirdML · a creative reasoning benchmark testing unusual pattern matching · is now tracked with 8 models scored.

Nuovo benchmarkImpatto basso

March 20, 2026

Nemotron 3 Super (120B) launched by NVIDIA

NVIDIA shipped Nemotron 3 Super · a 120B parameter MoE model (12B active) at $0.10/$0.40 per 1M tokens. A free variant is also available.

Nuovo modelloImpatto medionvidia

Gemini 2.5 Flash Lite priced at $0.10/$0.40 per 1M tokens

Google DeepMind confirmed Flash Lite pricing · the cheapest Gemini model to date, targeting high-volume production workloads.

$0.10/$0.40 per 1M

Calo di prezzoImpatto mediogoogle

March 18, 2026

Mistral Large 3 2512 released by Mistral AI

Mistral AI shipped their flagship Large 3 model (December 2025 checkpoint) at $0.50/$1.50 per 1M tokens with 128K context.

Nuovo modelloImpatto mediomistral

Grok Code Fast 1 added to agent rankings

xAI's dedicated coding agent · Grok Code Fast 1 · entered the SWE-bench leaderboard at $0.20/$1.50 per 1M tokens.

Nuovo agenteImpatto medioxai

March 16, 2026

Claude Sonnet 4.5 scores 91.7% on MMLU

Anthropic's Claude Sonnet 4.5 achieved 91.7% on MMLU · a strong result for a mid-tier model, surpassing several flagship competitors.

91.7%

Punteggio aggiornatoImpatto medioanthropic

12 new MCP servers added across 5 categories

Largest weekly MCP registry growth · 12 servers covering search, finance, communication, database, and dev-tools categories.

Server MCPImpatto basso

March 14, 2026

GPT-5.4 Pro launched · OpenAI's new flagship

OpenAI released GPT-5.4 Pro · their most capable model yet, targeting enterprise and research use cases at premium pricing.

Nuovo modelloImpatto elevatoopenai

GPT-5.4 standard tier released by OpenAI

Alongside the Pro variant, OpenAI shipped the standard GPT-5.4 · positioned as the successor to GPT-5.3 Chat for general use.

Nuovo modelloImpatto elevatoopenai

March 12, 2026

Grok 3 Mini marked as deprecated by xAI

xAI deprecated Grok 3 Mini · users are directed to Grok 4.1 Fast as the recommended replacement at $0.20/$0.50 per 1M tokens.

DeprecatoImpatto bassoxai

Llama 3.3 Nemotron Super 49B pricing dropped

NVIDIA cut Nemotron Super 49B pricing to $0.10/$0.40 per 1M tokens · making it the cheapest 49B-class model available.

$0.20/$0.80 per 1M→$0.10/$0.40 per 1M

Calo di prezzoImpatto bassonvidia

March 10, 2026

Liquid added as a tracked provider

Liquid joined BenchGecko with their LFM2-24B model · a 24B parameter mixture-of-experts architecture.

Nuovo providerImpatto bassoliquid

MiniMax M2.7 released by MiniMax

MiniMax shipped M2.7 · a competitive mid-tier model with strong multilingual benchmarks and 128K context.

Nuovo modelloImpatto bassominimax

March 8, 2026

Grok 4 posted 89.4% on GPQA Diamond

xAI's Grok 4 achieved 89.4% on the GPQA Diamond benchmark · a new high for the Grok family.

89.4%

Punteggio aggiornatoImpatto medioxai

LAMBADA benchmark scores now tracked

BenchGecko added LAMBADA · a language modeling benchmark measuring word prediction in long-range contexts · with 22 models scored.

Nuovo benchmarkImpatto basso

March 5, 2026

Gemini 2.5 Flash output price reduced to $2.50 per 1M tokens

Google DeepMind reduced Gemini 2.5 Flash output pricing from $3.50 to $2.50 per 1M tokens, improving its cost-performance ratio.

$3.50/M output→$2.50/M output

Calo di prezzoImpatto mediogoogle

Mercury 2 launched by Inception

Inception released Mercury 2 · their second-generation model with improved reasoning capabilities, available via OpenRouter.

Nuovo modelloImpatto bassoinception

March 3, 2026

Qwen3.5-Flash released by Alibaba Qwen

Alibaba shipped Qwen3.5-Flash · a lightweight model targeting fast inference at competitive pricing for the Asian market.

Nuovo modelloImpatto medioalibaba

5 new MCP servers added · finance and auth categories

Five new MCP servers registered this week, with a focus on financial data integrations and authentication providers.

Server MCPImpatto basso