AI Agent Glossary for Buyers: 28 Terms, Defined

Procurement conversations about AI agents fail when buyer and vendor use the same words to mean different things. This glossary defines 28 terms that show up in agent procurement, organised by category. Each entry includes a one-sentence definition, the term it is most often confused with, and (where useful) a link to a longer explainer in the cluster. The glossary is designed for buyers; the definitions are deliberately operational, not academic.

The vocabulary draws from primary sources where they exist: Anthropic's engineering blog on agentic systems (retrieved 2026-05-07), the GAIA benchmark paper (arXiv:2311.12983), the original ReAct paper (Yao et al., 2022), and the NIST AI Risk Management Framework. Where the field disagrees, the entry uses the operational definition rather than the contested one.

Core terms

Agent

A software system that perceives state, decides on actions, calls tools, and works toward a goal across multiple steps without per-step human approval. Often confused with assistant. See agent vs chatbot vs assistant.

Autonomy

The degree to which a system selects its own next step. Measured on five axes (decision-making, tool-use, planning, error recovery, time horizon). Often confused with automation. See autonomous vs assistive AI.

Agency

The capacity to act on the world toward goals. A precondition for autonomy. Often confused with autonomy itself; agency is the property, autonomy is the level.

Agentic

Adjective describing systems that exhibit agency. Anthropic defines agentic systems as those where the LLM dynamically directs its own processes and tool usage (Anthropic, 2024). See agentic AI without jargon.

LLM

Large language model. A neural network that produces text or tokens conditioned on input. The brain inside an agent, not the agent itself. Often confused with agent. See agent vs LLM.

Chatbot

A conversational interface to a model or rule set. Single turn or short multi-turn; rarely takes irreversible actions. Often confused with agent. See agent vs chatbot vs assistant.

Copilot

An assistive AI system that suggests, drafts, or completes actions inside a tool the human is using. Human stays in the loop. Often confused with agent. See agent vs copilot.

Capability terms

Tool use

The agent's ability to call external functions or APIs to act. Most often implemented via function calling. Often confused with function calling itself; tool use is the capability, function calling is the mechanism. See tool use explained.

Function calling

The mechanism by which an LLM emits a structured payload describing the tool to call and the arguments. Standardised by OpenAI in 2023 (OpenAI function calling docs). The plumbing under tool use.

Planning

Producing a sequence of steps that, if executed, achieves the goal. Implicit (the LLM plans token-by-token) or explicit (a planner module emits steps). Often confused with reasoning. See orchestration.

Reasoning

Multi-step inference toward a conclusion. Often implemented via chain-of-thought, ReAct, or tree-of-thought patterns. Often confused with pattern matching. See reasoning vs pattern matching.

Memory

State the agent retains across steps or sessions. Three layers: short-term context, long-term vector store, episodic memory. Often confused with context window. See memory explained.

Context window

The maximum number of tokens the LLM can attend to in a single inference call. A property of the model, not the agent. Memory persists; context windows do not.

Retrieval

The process of finding relevant documents or facts and including them in the LLM input. Often paired with generation (RAG). Becomes a tool when the agent decides when and what to retrieve.

Architecture terms

Orchestration

Coordinating multiple steps, tools, and (sometimes) sub-agents toward the goal. The runtime layer above the LLM. See orchestration explained.

Planner-executor-evaluator

A common multi-component pattern: a planner produces steps, an executor runs them, an evaluator checks the result. Useful when the task has clear completion criteria.

ReAct

Reasoning + Acting. A pattern that interleaves reasoning steps with tool calls (Yao et al., ReAct, 2022). The default execution pattern for most agent frameworks in 2026.

Multi-agent

An architecture where multiple specialised agents coordinate to complete a task. Useful for tasks with parallelisable subgoals; expensive to coordinate. Often confused with single-agent systems with multiple tools.

Single-agent

One agent with access to tools. The default. Multi-agent helps only when subgoals genuinely parallelise; for most business tasks, single-agent is correct.

Agentic RAG

Retrieval-augmented generation where retrieval is exposed as a tool the agent calls when needed, rather than a fixed pipeline step. Improves precision; lets the agent skip retrieval on questions that do not need it.

Function-calling loop

The runtime cycle: LLM emits a tool call, runtime executes it, runtime returns the result, LLM emits the next call or final answer. Repeated until the agent emits a "done" signal or hits a step limit.

The most-confused pairs are the ones that change scope estimates by 10x. Buyers and vendors should agree on these before pricing.

Evaluation terms

GAIA

General AI Assistants benchmark. 466 questions across three difficulty levels. Reports human pass rates above 90 percent and top agents below 50 percent on Level 3 (Mialon et al., 2023).

SWE-bench

Software engineering benchmark using real GitHub issues. Measures whether an agent can resolve a real bug. The leaderboard is the public reference for code-agent capability (swebench.com).

AgentBench

Cross-environment benchmark covering web, code, game, and household tasks (Liu et al., 2023). Useful for comparing across task types, not just code.

Reliability

The probability that an agent completes a task correctly under a defined input distribution and tool environment. The 80-test methodology operationalises this for Gravity. See 80-test methodology.

Safety and governance

Refusal correctness

The agent refuses tasks it should refuse and does not refuse tasks it should not. Refusal failures in either direction are common. One of the eight categories in the 80-test methodology.

Hostile input

Prompt injection, jailbreak, or social engineering inside untrusted content the agent reads (emails, web pages, files). Listed in OWASP Top 10 for LLM Applications.

Idempotency

Running the same task twice produces the same effect as running it once. Critical for actions with real-world side effects (payments, emails, writes). One of the eight categories in the 80-test methodology.

Blast radius

The maximum scope of damage from a single agent action. Constrained by tool catalogue, permissions, and circuit breakers. NIST AI RMF treats blast radius as a core risk dimension.

Frequently asked questions

What is an AI agent in simple terms?

An AI agent is a software system that perceives its environment, decides on actions, calls tools to execute those actions, and works toward a goal across multiple steps without per-step human approval. The defining feature is autonomy: choosing the next step based on the goal and current state, not waiting for a human prompt at each junction.

What is the difference between an agent and an LLM?

An LLM is a model that produces text or other tokens in response to input. An agent is a system that uses an LLM (or several) as one component, plus tools, memory, planning, and recovery loops, to achieve a goal. The LLM is a brain; the agent is the body, hands, and calendar around it.

What does agentic mean?

Agentic describes systems that exhibit agency: the capacity to perceive, plan, act, and learn toward goals over time. Anthropic's engineering blog defines agentic systems as those where the LLM dynamically directs its own processes and tool usage. The term is gradient, not binary; systems are more or less agentic on the five autonomy axes.

What is tool use in AI agents?

Tool use is the agent's ability to call external functions or APIs to act on the world. The agent selects the right tool from a catalogue based on the current step, formats arguments, sends the call, and parses the response. Tool use is what makes the difference between an LLM that talks and an agent that acts.

Where can I find longer explanations of these terms?

Each term in this glossary links to a longer explainer in the Gravity blog cluster. The pillar hub is the planned what-is-an-autonomous-ai-agent post; spokes cover memory, tool use, reasoning, orchestration, and the autonomy spectrum. Each spoke includes worked examples and primary sources.

Three takeaways before you close this tab

Define the terms before pricing the work. Agent vs LLM alone changes scope by 10x.
Memory is not context. Memory persists; context windows do not.
Reliability is a number, not an adjective. If a vendor cannot put a number on it, treat the claim as unverified.

Sources

Anthropic, "Building Effective Agents", retrieved 2026-05-07, anthropic.com/engineering/building-effective-agents
Mialon et al., "GAIA: A Benchmark for General AI Assistants", arXiv:2311.12983, 2023, retrieved 2026-05-07, arxiv.org/abs/2311.12983
Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models", arXiv:2210.03629, 2022, retrieved 2026-05-07, arxiv.org/abs/2210.03629
Liu et al., "AgentBench: Evaluating LLMs as Agents", arXiv:2308.03688, 2023, retrieved 2026-05-07, arxiv.org/abs/2308.03688
NIST, "AI Risk Management Framework", retrieved 2026-05-07, nist.gov/itl/ai-risk-management-framework
OWASP, "Top 10 for LLM Applications", retrieved 2026-05-07, owasp.org