Beta
ConceptsReading · ~3 min · 73 words deep

Tokens

The fundamental unit that LLMs read and generate · 1 token ≈ 0.75 English words or 4 characters.

TL;DR

The fundamental unit that LLMs read and generate · 1 token ≈ 0.75 English words or 4 characters.

Level 1

Every LLM breaks text into tokens before processing. A token is usually a subword fragment, not a whole word · "unbelievable" might be ["un", "believ", "able"]. Pricing, context windows, and speed metrics are all measured in tokens. 1M tokens ≈ 750K English words ≈ a mid-length novel.

Level 2

Tokenization is language-specific. English averages 1.3 tokens per word. Code averages 2.5. Chinese and other non-Latin scripts average 2-4× more tokens per character than English, which is why non-English API calls cost more. Tokenizer algorithms vary: GPT uses BPE (Byte Pair Encoding), Llama uses SentencePiece, Claude uses a Claude-specific tokenizer. Vocabulary sizes range from 32K (Llama 2) to 200K+ (GPT-4o, Claude 3+). Larger vocab means fewer tokens per input but larger embedding tables.

Level 3

BPE splits text greedily by merging the most frequent byte pairs until the vocabulary is full. SentencePiece works at the raw byte level and is language-agnostic. Tiktoken (OpenAI) exposes a fast Rust-backed encoder. Token count = cost. For a 10M-token/month workload on GPT-5 at $10/M output, that's $100 in output billing. Multilingual efficiency is a known challenge: Japanese costs ~4× more per character than English on most providers.

The takeaway for you
If you are a
Researcher
  • ·BPE, SentencePiece, Tiktoken are the common algorithms
  • ·Vocabulary size trades off embedding table size vs tokens per input
  • ·Non-Latin scripts suffer 2-4× worse tokenization ratios
If you are a
Builder
  • ·Count tokens before sending · use tiktoken or the provider SDK
  • ·Budget for 1.3 tokens per English word, 2.5 per code word
  • ·Non-English workloads are more expensive per character
If you are a
Investor
  • ·Token pricing is the unit economic · every benchmark comparison normalizes to $/M tokens
  • ·Tokenizer efficiency is a hidden lever in multilingual cost
  • ·Model providers occasionally revise tokenizers · watch for pricing shifts
If you are a
Curious · Normie
  • ·AI reads and writes in tokens, not words
  • ·1 token is about 3/4 of a word
  • ·Why the bill jumps when you send long messages
Gecko's take

Tokens are the units of AI billing. Understand them or overpay.

Most providers ship an SDK helper. OpenAI's tiktoken is the fastest open implementation. Anthropic offers count_tokens via the API.