Yes · MCP server tools add to input token count once per session. Cache definitions to amortize.

How do I keep agent costs low?

Parallel tool calls, cached tool definitions, and aggressive context summarization. These three together cut cost 50-80%.

PricingReading · ~3 min · 73 words deep

Function Call Billing

When a model emits a function call, the JSON tool-invocation counts as output tokens · cheap tools still carry token cost, and chained tool calls compound fast.

TL;DR

When a model emits a function call, the JSON tool-invocation counts as output tokens · cheap tools still carry token cost, and chained tool calls compound fast.

Level 1

Basic

Providers bill the JSON representation of function calls as output tokens. A simple tool call like `{"name":"get_weather","args":{"city":"Paris"}}` is ~15 tokens. Multi-tool chains (agent loop: plan → tool call → result → plan again) can easily accumulate 5-20 tool calls, each priced as output. Tool definitions themselves count as input tokens · registering 50 tools on every call bloats the input bill.

Level 2

Deep

Anthropic added optimized tool use in 2025 that strips redundant JSON framing, reducing cost 15-20% vs naive. OpenAI now supports "parallel tool calls" where one assistant turn emits multiple tool invocations in a single response · reduces per-call overhead. MCP servers add tool definitions via the clients' session context · those definitions hit input tokens once per new session. Caching tool definitions (Anthropic cache_control) brings input cost down to near-zero on subsequent calls.

Level 3

Expert

Agent pricing math: typical loop burns 5-20 tool calls, each roughly 50-200 tokens output. A 10-step agent run at $3/M output = ~$0.003. Looks cheap, but scales with traffic. The real cost is cumulative context: each tool result feeds into the next turn as input, so input tokens grow linearly with steps. Best practice: summarize old tool results, drop unneeded context, and cache long-lived tool definitions.

The takeaway for you

Depending on why you're here

If you are a

Researcher

·Tool calls emit JSON as output tokens
·Tool definitions count as input tokens (per session)
·Anthropic + OpenAI optimize framing for cost

If you are a

Builder

·Cache tool definitions with cache_control
·Use parallel tool calls where supported
·Summarize old tool results · context grows fast in agent loops

If you are a

Investor

·Agent adoption drives token consumption 10-50× vs chat
·Tool-heavy workloads are provider's most profitable
·MCP's success compounds this · more tools = more calls

If you are a

Curious · Normie

·When AI uses tools, each tool call has a cost
·Agents that use many tools cost more to run
·Hidden in your app's AI bill

Gecko's take

Function call billing is invisible until your agent traffic scales. Cache tools, parallelize calls, summarize context · or pay 10× what you need to.

Frequently Asked Questions

No · tool results are passed back to the model as input tokens in the next turn. Long results = expensive context.

Function Call Billing

Basic

Deep

Expert

Depending on why you're here

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed