Does every model support tool use?

Every frontier 2026 model yes. Older and smaller models often have weaker tool-calling reliability.

What is the best tool-use benchmark?

Berkeley Function Calling Leaderboard (BFCL) is the most cited. APEX-Agents and SWE-bench (agent variant) for agentic workloads.

ConceptsReading · ~3 min · 83 words deep

Tool Use

When the model calls external functions (search, calculator, DB, API) during a response · the building block for agents.

TL;DR

When the model calls external functions (search, calculator, DB, API) during a response · the building block for agents.

Level 1

Basic

Tool use lets the AI ask the outside world for help. The model generates a structured tool call (e.g., "search(\"AI news 2026\")"), a harness executes the call, and the result comes back into the conversation. Every modern frontier API supports tool use via function calling · OpenAI, Anthropic, Google, Mistral. MCP extends this with a standardized protocol for tool providers.

Level 2

Deep

Flow: model generates a JSON tool call matching a schema → harness validates + executes the call → result is appended to the conversation → model continues. Parallel tool calls (invoke multiple at once) reduce latency. Tool descriptions matter enormously · a well-named and well-described tool is called reliably; a poorly-described one is ignored. Common tools: web search, calculator, database query, code execution, file operations, API calls to external services, MCP-exposed capabilities. Safety: destructive tools (file delete, send email) should require user approval.

Level 3

Expert

Function calling APIs standardized the JSON schema for tools. Parallel tool use reduces round-trips · a frontier model can emit 3-10 tool calls in a single turn. Tool-result interleaving: results are fed back as special message turns. Few-shot examples of good tool use in the system prompt improve calling accuracy 10-30%. Agent frameworks (LangGraph, CrewAI, OpenAI Agent SDK) wrap tool use with orchestration, memory, and error handling. Evaluation: toolbench (Tencent), APEX-Agents (Epoch), BFCL (Berkeley function calling leaderboard).

The takeaway for you

Depending on why you're here

If you are a

Researcher

·JSON schema for tool definitions
·Parallel tool calls reduce agent round-trips 2-5×
·BFCL and APEX-Agents are the standard benchmarks

If you are a

Builder

·Write clear tool descriptions · biggest lever for reliability
·Support parallel calls · agents get faster dramatically
·Gate destructive tools behind user approval · safety critical

If you are a

Investor

·Tool use drives the agent revenue wave · chat plateaus, agents grow
·MCP winning the open-protocol layer
·Framework market consolidating (LangGraph, OpenAI Agent SDK)

If you are a

Curious · Normie

·AI that can use tools · search the web, run code, send email
·The upgrade from "chat" to "assistant"
·Every agent you use internally uses tool calling

Don't mix them up

Often confused with

Tool UsevsFunction calling

Function calling is the API mechanism. Tool use is the conceptual capability. MCP is the open protocol standardizing tool definitions across vendors.

Tool UsevsAgent

Tool use is a single call. An agent uses tools in a multi-step loop with planning and memory.

Gecko's take

Tool use turned LLMs from autocomplete into actors. Everything in 2026 agent land is built on this one capability.

Frequently Asked Questions

Tool use is the capability · one call. An agent chains many tool calls with planning and memory to accomplish a multi-step goal.

Tool Use

Basic

Deep

Expert

Depending on why you're here

Often confused with

Frequently Asked Questions

Related terms

Glossary

Explore live data

Cite or embed