Function calling and tool calling describe the same core behavior: a language model emits a structured request for an external capability, and the runtime executes it. The terms are often used interchangeably, but tool calling is the broader concept. Function calling is a specific form of it, rooted in the analogy of calling a named function in code.
If you are building or using AI agents and these terms are blurring together in documentation, this post untangles them clearly: what each term means, where they came from, how providers use them, and when the distinction actually affects what you build.
The Short Answer
Function calling: the model outputs a structured call to a developer-defined function, including the function name and a matching argument object. The application receives that output, validates it, runs the function, and returns the result.
Tool calling: a superset of function calling. A tool can be a function, but it can also be a web search capability, a code interpreter, a document retriever, an image generator, or a protocol-level server. The model invokes tools the same way it invokes functions, but the things being invoked are not always functions in the traditional sense.
In short: all function calls are tool calls. Not all tool calls are function calls.
Function Calling, Defined
Function calling emerged as a way to let language models request precise, deterministic operations that a model cannot perform on its own: looking up current data, performing arithmetic, writing to a database, or calling a REST endpoint. The model does not execute these operations itself. It decides a function should be called, emits the call in structured form, and waits for the runtime to return a result.
What the model actually emits
When a model decides to call a function, it outputs a structured object containing the function name and the argument values. The format varies by provider, but the structure is consistent: a name and an arguments payload that matches the schema the developer registered. The application receives this object, validates it, calls the real function, and feeds the return value back to the model's context so it can continue reasoning.
What the developer registers
For a model to call a function, the developer supplies a description of that function: its name, what it does, and what parameters it takes. This description is passed to the model at inference time, usually in the system prompt or a dedicated tools parameter. The model uses this description to decide whether and how to invoke the function. A well-written description matters; vague or incomplete descriptions cause the model to invoke functions incorrectly or not at all.
The synchronous assumption
Traditional function calling assumes a synchronous loop: model calls function, runtime executes it, runtime returns result, model continues. This works for fast operations like a database lookup or a calculation. It becomes a constraint when the "function" is a slow external operation, a streaming API, or a long-running job. That constraint is part of why the vocabulary expanded toward the broader term "tool."
Tool Calling, Defined
Tool calling broadens the concept to include any capability a model can invoke at runtime, not just developer-defined functions. The model's mechanism is the same: it outputs a structured invocation. What gets invoked is no longer limited to code functions.
What counts as a tool
A tool, in this framing, is any registered capability a model can request. That includes:
- A named code function in the application (the original function calling case)
- A web search query that returns live results
- A code interpreter that executes snippets in a sandbox
- A document retriever that queries a vector store
- An image generator that produces an image from a prompt
- A protocol server that exposes multiple sub-capabilities under one connection
Each of these is invoked by the model using the same structured-output mechanism, but none of the last four fits neatly into the "call a function, receive a return value" model. Web search returns a list of snippets; a code interpreter may produce output, errors, or side effects; an image generator returns a binary artifact. The tool abstraction accommodates all of these. For a broader overview of how agents use tools to extend what they can do, see AI agent tool use explained.
Tools as an abstraction layer
Thinking of everything as a tool rather than a function forces a useful abstraction. It shifts focus from "what code does this call" to "what capability does the model need." That framing is more useful when designing multi-step agents that combine heterogeneous capabilities, some synchronous, some asynchronous, some local, some remote.
History of the Terms
The terminology did not arrive from a standards body. It evolved independently at different labs, and the inconsistency in the landscape today reflects that origin.
How "function calling" became common
The function calling label gained wide adoption when several major model providers launched the capability under that name. It was a natural analogy: developers already understood functions, and describing the model's behavior as "calling a function" made the concept immediately legible to software engineers. The name stuck, and a large amount of tooling, documentation, and developer muscle memory formed around it.
How "tool calling" and "tool use" emerged
As providers built capabilities that did not fit the function analogy cleanly, including built-in web search, code execution, and multi-modal operations, the vocabulary shifted. Some providers adopted "tool use" or "tool calling" as a more accurate umbrella term, explicitly categorizing functions as one type of tool rather than treating them as synonymous. Other providers kept "function calling" for consistency even after expanding the range of capabilities, which is a major reason the terms remain confused today.
The current state: overlapping vocabulary
Today, you will find different providers using the same underlying mechanism under different names. Reading the documentation for a specific provider or SDK requires attention: "function calling" in one context may mean a narrow code-function invocation, while in another context it may describe the full tool-use mechanism including built-in capabilities. Neither usage is wrong; they just reflect where each provider started and how they evolved the feature.
How the Mechanism Works
Regardless of what the capability is called, the runtime loop is consistent across providers.
The basic loop
The developer registers one or more tools or functions with their schemas. At inference time, these schemas are passed to the model as part of the input. The model reasons about the user's request, decides a tool should be invoked, and outputs a structured invocation object instead of (or before) producing a text response. The application receives the invocation, executes it, and returns the result as a new message in the conversation. The model receives the result and continues: either generating a final response or deciding to invoke another tool.
Parallel and sequential calls
Many model providers support requesting multiple tool calls in a single model turn. A model might decide it needs to look up three different pieces of information before it can answer a question and emit all three invocations at once. The runtime executes them in parallel, collects the results, and returns them together. This matters for agent latency: an agent that makes three sequential calls takes three times as long as one that batches them.
Structured outputs and schemas
The quality of a tool invocation depends on how well the model can match its output to the registered schema. JSON Schema is the most common format for describing tool parameters. A schema defines the argument names, types, descriptions, required fields, and constraints. The model reads this at inference time and must produce arguments that conform to it. This is why tool schema design is a meaningful engineering concern, not just documentation. Poorly structured schemas produce poorly structured invocations. For more on how agents chain multiple tool calls into multi-step workflows, see how to build a multi-step agent workflow.
Where the Terms Diverge
The practical differences between function calling and tool calling emerge at the edges: what you can invoke, what the return value looks like, and how complex the capability can get.
Return value shape
A function call classically returns a typed value: a string, a number, a JSON object. The model receives it and treats it like any other input. A tool call can return something richer or more ambiguous: a list of web snippets with varying relevance, an image, a code execution output with stdout and stderr, or a partial stream. The application layer has to decide how to format that return for the model to consume, and that formatting decision affects downstream reasoning quality.
Side effects and statefulness
Functions in the function-calling model are usually expected to behave like pure-ish operations: inputs in, outputs out. Tools can carry state and side effects. A tool that sends an email or commits to a database does not just return a value; it changes the world. The tool calling framing is more honest about this. Understanding the stateful side of tool-equipped agents connects directly to how agents handle planning versus execution: the plan is stateless; the tool executions are not.
Composability
When tools can themselves invoke other tools, or when one tool call triggers a chain of further calls, the agent is doing something that bears little resemblance to a single function invocation. Multi-agent systems, where one agent spawns or delegates to another, are a natural extension of tool calling. The agent treating another agent as a tool is a common architectural pattern. That pattern does not fit comfortably inside the "function calling" vocabulary. For how this plays out at the system level, see multi-agent systems explained.
Tool Schemas and the Model Context Protocol
One development that has sharpened the tool calling vocabulary is the emergence of protocol-level standards for exposing tools to models at runtime.
What the Model Context Protocol does
The Model Context Protocol (MCP) is an open standard that defines how a host application discovers and connects to tool servers at runtime. Instead of tools being hard-coded into an application, an MCP server exposes a list of tools with their schemas over a standard connection. The model client queries the server, learns what tools are available, and can invoke any of them using the same structured invocation mechanism.
MCP is a discovery layer, not an invocation mechanism
It is important not to conflate MCP with tool calling itself. MCP handles how tools are registered and discovered. Tool calling is how the model invokes them once they are known. You can do tool calling without MCP (by defining tools directly in the application). MCP makes it practical to expose large, dynamic sets of tools without touching application code for each one. The two concepts work together but solve different problems.
Why this matters for agent builders
If you are building on a platform that supports MCP servers, the set of tools available to your agent can grow without rebuilding the agent. New capabilities registered on the MCP server become available at next inference time. This composability is one of the design principles behind platforms like Gravity, where expert-built agents can call the right tools for a task without the user needing to configure each one. On Gravity, the agent decides which tools to invoke; the user just describes what outcome they need.
Practical Implications for Agent Builders
The terminology question becomes concrete when you sit down to design an agent. Here is where the distinction between function calling and tool calling shapes real decisions.
Choosing what to expose as a tool versus inline logic
Not everything should be a tool. Capabilities that are fast, deterministic, and never need external data are often better handled inline. Capabilities that require external I/O, may fail independently, or produce variable output benefit from being exposed as discrete tools. The tool boundary is also a failure boundary: if a tool call fails, the agent can retry or route around it without failing the entire run. Good tool decomposition improves both reliability and debuggability.
Schema quality as a first-class concern
A model can only call a tool correctly if the schema describes it accurately. Name and description matter as much as type definitions. A tool named get_data with a vague description will be invoked less reliably than one named get_customer_order_history with a precise description of what it returns and when to use it. Treat schema design as user research: the model is the user, and the schema is the documentation it reads in real time.
Observability across tool calls
In multi-step agents, a single user request may trigger a dozen tool calls across several model turns. Knowing which tools were called, with what arguments, and what they returned is essential for debugging failures. Designing for observability from the start, logging every invocation and its result, makes the difference between an agent that is maintainable and one that fails opaquely. This connects to the broader topic of context window management, since tool results accumulate in context and eventually hit limits.
Provider differences in practice
If you are reading documentation across multiple providers and seeing "function calling" in one place and "tool calling" in another, assume the mechanism is the same and look at the actual API shape: how tools are registered, how invocations are formatted, and how results are returned. The surface differences are often shallow. Where they are deep, they usually concern parallel calling, streaming, or built-in tools, not the basic structured-invocation loop.
Common Misconceptions
Misconception: "tool calling" is a newer, better version of "function calling"
Not quite. Tool calling is a broader framing, not a replacement. Many providers use the terms in parallel. You do not need to migrate from one to the other; you need to understand which concept applies to what you are building.
Misconception: the model actually runs the function
The model does not execute anything. It produces a structured output that says "invoke this tool with these arguments." The application handles execution. This matters for security: a model that emits a call to a destructive operation can only cause harm if the application follows through. Access controls live in the application layer, not the model. This is the blast-radius question for any agent that can modify data or send messages on behalf of a user.
Misconception: function calling requires the model to know the implementation
The model never sees the function's code. It only sees the schema: the name, the description, and the parameter definitions. The implementation is entirely hidden from the model. This means you can change an implementation without retraining or re-prompting, as long as the schema stays consistent. It also means a well-described schema can make a complex implementation look simple to the model.
Misconception: tool calling is only for external APIs
Tools can wrap anything the application can execute: a local database query, a Python function in the same process, a file read, or an HTTP call to a remote service. The external/internal distinction does not matter to the mechanism. What matters is that the capability is registered with a schema and returns a result the model can use. Understanding what an AI agent is helps clarify why tools are central: agents need ways to act on the world, and tools are how that happens.
Frequently Asked Questions
Are tool calling and function calling the same thing?
They overlap significantly but are not identical. Function calling refers specifically to a model emitting a structured request to invoke a developer-defined function, with arguments the runtime executes. Tool calling is a broader term that covers functions plus any other capability a model can invoke: web search, code interpreters, external APIs, document retrievers, and protocol-level servers. Every function call is a tool call, but not every tool call is a function in the traditional sense.
Why do different AI providers use different terminology?
The terms evolved independently at different labs. Some providers launched their feature under the name function calling, reflecting the analogy to calling a function in code. Others adopted tool calling or tool use to signal a broader scope that includes things like web search or code execution that are not conventional functions. The underlying mechanism, where the model outputs structured data and the runtime acts on it, is consistent across providers even when the label differs.
What is a tool schema in the context of AI agents?
A tool schema is a structured description, usually in JSON Schema format, that tells the model what a tool does, what arguments it accepts, and which arguments are required. The model reads the schema at inference time and decides whether to invoke the tool, then emits the tool name and a matching argument object. The calling application validates and executes the call, then returns the result to the model for further reasoning.
What is the Model Context Protocol (MCP) and how does it relate to tool calling?
The Model Context Protocol is an open standard that defines how a host application exposes tools, prompts, and resources to a model at runtime. MCP is a transport and discovery layer: it lets a model learn what tools are available from a running server without the tools being hard-coded into the application. Tool calling is still how the model invokes those tools; MCP is how the tools are registered and exposed in the first place.
When does the distinction between tool calling and function calling matter in practice?
It matters when you are designing an agent that needs to invoke capabilities that do not fit the traditional function model: web search with raw results, a code sandbox, a document retriever, or a streaming API. In those cases, the term tool calling is the more accurate framing because the invoked capability may not return a simple value. It also matters when reading provider documentation, since providers use the terms inconsistently and misreading them can cause integration errors.