Beta
PricingReading · ~3 min · 73 words deep

Function Call Billing

When a model emits a function call, the JSON tool-invocation counts as output tokens · cheap tools still carry token cost, and chained tool calls compound fast.

TL;DR

When a model emits a function call, the JSON tool-invocation counts as output tokens · cheap tools still carry token cost, and chained tool calls compound fast.

Level 1

Providers bill the JSON representation of function calls as output tokens. A simple tool call like `{"name":"get_weather","args":{"city":"Paris"}}` is ~15 tokens. Multi-tool chains (agent loop: plan → tool call → result → plan again) can easily accumulate 5-20 tool calls, each priced as output. Tool definitions themselves count as input tokens · registering 50 tools on every call bloats the input bill.

Level 2

Anthropic added optimized tool use in 2025 that strips redundant JSON framing, reducing cost 15-20% vs naive. OpenAI now supports "parallel tool calls" where one assistant turn emits multiple tool invocations in a single response · reduces per-call overhead. MCP servers add tool definitions via the clients' session context · those definitions hit input tokens once per new session. Caching tool definitions (Anthropic cache_control) brings input cost down to near-zero on subsequent calls.

Level 3

Agent pricing math: typical loop burns 5-20 tool calls, each roughly 50-200 tokens output. A 10-step agent run at $3/M output = ~$0.003. Looks cheap, but scales with traffic. The real cost is cumulative context: each tool result feeds into the next turn as input, so input tokens grow linearly with steps. Best practice: summarize old tool results, drop unneeded context, and cache long-lived tool definitions.

The takeaway for you
If you are a
Researcher
  • ·Tool calls emit JSON as output tokens
  • ·Tool definitions count as input tokens (per session)
  • ·Anthropic + OpenAI optimize framing for cost
If you are a
Builder
  • ·Cache tool definitions with cache_control
  • ·Use parallel tool calls where supported
  • ·Summarize old tool results · context grows fast in agent loops
If you are a
Investor
  • ·Agent adoption drives token consumption 10-50× vs chat
  • ·Tool-heavy workloads are provider's most profitable
  • ·MCP's success compounds this · more tools = more calls
If you are a
Curious · Normie
  • ·When AI uses tools, each tool call has a cost
  • ·Agents that use many tools cost more to run
  • ·Hidden in your app's AI bill
Gecko's take

Function call billing is invisible until your agent traffic scales. Cache tools, parallelize calls, summarize context · or pay 10× what you need to.

No · tool results are passed back to the model as input tokens in the next turn. Long results = expensive context.