Temperature
A knob that controls how random or deterministic the AI's output is · 0 = same answer every time, higher = more varied.
A knob that controls how random or deterministic the AI's output is · 0 = same answer every time, higher = more varied.
Basic
Temperature scales the probability distribution over next tokens before sampling. At temperature 0, the model always picks the highest-probability token (deterministic, boring, consistent). At 1.0, sampling matches the model's learned distribution (balanced). At 2.0, distribution flattens, output becomes erratic. Most production code uses temperature 0.0 to 0.7.
Deep
Math: logits are divided by temperature before softmax. Lower temperature sharpens the distribution; higher flattens it. Temperature 0 is not technically in the formula (would divide by zero) · it's implemented as argmax (pick highest logit). Common uses: 0 for structured extraction, factual Q&A; 0.3-0.7 for balanced chat; 0.8-1.2 for creative writing. Temperature and top-p are often used together but shouldn't both be tuned aggressively · pick one.
Expert
Temperature = 0 breaks ties by tokenization order in most implementations, which is why some "deterministic" calls still vary across infrastructure. True determinism requires seed parameter (OpenAI supports, Anthropic doesn't). Temperature affects reasoning quality non-linearly · too low produces brittle reasoning, too high introduces errors. For reasoning models, temperature usually sets at 0-0.3 range for correctness.
Depending on why you're here
- ·logit_i / T before softmax
- ·True determinism needs seed · temperature 0 alone isn't enough
- ·Reasoning quality peaks at 0.0-0.3
- ·0 for structured extraction and factual tasks
- ·0.7 for chat · natural variation without chaos
- ·Don't combine aggressive temperature + aggressive top-p
- ·Temperature is an API primitive · universal across vendors
- ·Reasoning model temperature defaults drive reliability perception
- ·Most API calls use defaults · pricing parameter matters more to most users
- ·How creative vs robotic the AI sounds
- ·Low = same answer every time, high = wild answers
- ·Most apps use a low-medium setting
Temperature is over-tuned in most production code. Use 0 for structured, 0.7 for chat, and move on.