Why is my Japanese query more expensive than English?

Non-Latin scripts tokenize less efficiently. A Japanese sentence can use 2-4× more tokens than its English equivalent.

What is a token limit?

The maximum tokens a model can process in one request (input + output). Ranges from 32K to 10M in 2026.

ConceptsReading · ~3 min · 73 words deep

Tokens

The fundamental unit that LLMs read and generate · 1 token ≈ 0.75 English words or 4 characters.

TL;DR

The fundamental unit that LLMs read and generate · 1 token ≈ 0.75 English words or 4 characters.

Level 1

Basic

Every LLM breaks text into tokens before processing. A token is usually a subword fragment, not a whole word · "unbelievable" might be ["un", "believ", "able"]. Pricing, context windows, and speed metrics are all measured in tokens. 1M tokens ≈ 750K English words ≈ a mid-length novel.

Level 2

Deep

Tokenization is language-specific. English averages 1.3 tokens per word. Code averages 2.5. Chinese and other non-Latin scripts average 2-4× more tokens per character than English, which is why non-English API calls cost more. Tokenizer algorithms vary: GPT uses BPE (Byte Pair Encoding), Llama uses SentencePiece, Claude uses a Claude-specific tokenizer. Vocabulary sizes range from 32K (Llama 2) to 200K+ (GPT-4o, Claude 3+). Larger vocab means fewer tokens per input but larger embedding tables.

Level 3

Expert

BPE splits text greedily by merging the most frequent byte pairs until the vocabulary is full. SentencePiece works at the raw byte level and is language-agnostic. Tiktoken (OpenAI) exposes a fast Rust-backed encoder. Token count = cost. For a 10M-token/month workload on GPT-5 at $10/M output, that's $100 in output billing. Multilingual efficiency is a known challenge: Japanese costs ~4× more per character than English on most providers.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·BPE, SentencePiece, Tiktoken are the common algorithms
·Vocabulary size trades off embedding table size vs tokens per input
·Non-Latin scripts suffer 2-4× worse tokenization ratios

If you are a

Builder

·Count tokens before sending · use tiktoken or the provider SDK
·Budget for 1.3 tokens per English word, 2.5 per code word
·Non-English workloads are more expensive per character

If you are a

Investor

·Token pricing is the unit economic · every benchmark comparison normalizes to $/M tokens
·Tokenizer efficiency is a hidden lever in multilingual cost
·Model providers occasionally revise tokenizers · watch for pricing shifts

If you are a

Curious · Normie

·AI reads and writes in tokens, not words
·1 token is about 3/4 of a word
·Why the bill jumps when you send long messages

Gecko's take

Tokens are the units of AI billing. Understand them or overpay.

Frequently Asked Questions

Most providers ship an SDK helper. OpenAI's tiktoken is the fastest open implementation. Anthropic offers count_tokens via the API.

Tokens

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed