Beta
Learning path3 terms · ~9 min read

Pick an AI Model

Six terms to go from "I need an AI" to "here is the cheapest model that meets my spec."

Start · MMLU
BenchmarksChapter 1 of 3

The baseline knowledge benchmark everyone cites.

TL;DR

MMLU is a knowledge benchmark tracked by BenchGecko across every frontier and open-weight model.

MMLU all frontier models score 90%+ · differentiator fading.

Read full chapter
BenchmarksChapter 2 of 3

If your workload is code, this is the one to care about.

TL;DR

A benchmark where models attempt real GitHub issues · judged by whether their patch passes the project's test suite.

SWE-bench is the single most-watched AI benchmark of 2026. Every coding agent release ships a SWE-bench number first.

Read full chapter
ConceptsChapter 3 of 3

How much input the model can hold at once.

TL;DR

The max number of tokens · input + output · a model can handle in a single request. Ranges from 32K to 2M in 2026.

Context windows hit diminishing returns past 200K for most workloads. 1M+ is for agents and codebase-scale retrieval, not chat.

Read full chapter
What you learned

By the end you can evaluate a model by benchmark match, price, context window, and speed · and pick the winner for your specific workload.

Keep learning
Next path · 7 terms
The AI Bubble Explained

Seven terms that decode whether AI is overpriced, fairly priced, or criminally underpriced. Read in order.