Procurement conversations about AI agents fail when buyer and vendor use the same words to mean different things. This glossary defines 28 terms that show up in agent procurement, organised by category. Each entry includes a one-sentence definition, the term it is most often confused with, and (where useful) a link to a longer explainer in the cluster. The glossary is designed for buyers; the definitions are deliberately operational, not academic.
The vocabulary draws from primary sources where they exist: Anthropic's engineering blog on agentic systems (retrieved 2026-05-07), the GAIA benchmark paper (arXiv:2311.12983), the original ReAct paper (Yao et al., 2022), and the NIST AI Risk Management Framework. Where the field disagrees, the entry uses the operational definition rather than the contested one.
Core terms
Agent
A software system that perceives state, decides on actions, calls tools, and works toward a goal across multiple steps without per-step human approval. Often confused with assistant. See agent vs chatbot vs assistant.
Autonomy
The degree to which a system selects its own next step. Measured on five axes (decision-making, tool-use, planning, error recovery, time horizon). Often confused with automation. See autonomous vs assistive AI.
Agency
The capacity to act on the world toward goals. A precondition for autonomy. Often confused with autonomy itself; agency is the property, autonomy is the level.
Agentic
Adjective describing systems that exhibit agency. Anthropic defines agentic systems as those where the LLM dynamically directs its own processes and tool usage (Anthropic, 2024). See agentic AI without jargon.
LLM
Large language model. A neural network that produces text or tokens conditioned on input. The brain inside an agent, not the agent itself. Often confused with agent. See agent vs LLM.
Chatbot
A conversational interface to a model or rule set. Single turn or short multi-turn; rarely takes irreversible actions. Often confused with agent. See agent vs chatbot vs assistant.
Copilot
An assistive AI system that suggests, drafts, or completes actions inside a tool the human is using. Human stays in the loop. Often confused with agent. See agent vs copilot.
Capability terms
Tool use
The agent's ability to call external functions or APIs to act. Most often implemented via function calling. Often confused with function calling itself; tool use is the capability, function calling is the mechanism. See tool use explained.
Function calling
The mechanism by which an LLM emits a structured payload describing the tool to call and the arguments. Standardised by OpenAI in 2023 (OpenAI function calling docs). The plumbing under tool use.
Planning
Producing a sequence of steps that, if executed, achieves the goal. Implicit (the LLM plans token-by-token) or explicit (a planner module emits steps). Often confused with reasoning. See orchestration.
Reasoning
Multi-step inference toward a conclusion. Often implemented via chain-of-thought, ReAct, or tree-of-thought patterns. Often confused with pattern matching. See reasoning vs pattern matching.
Memory
State the agent retains across steps or sessions. Three layers: short-term context, long-term vector store, episodic memory. Often confused with context window. See memory explained.
Context window
The maximum number of tokens the LLM can attend to in a single inference call. A property of the model, not the agent. Memory persists; context windows do not.
Retrieval
The process of finding relevant documents or facts and including them in the LLM input. Often paired with generation (RAG). Becomes a tool when the agent decides when and what to retrieve.
Architecture terms
Orchestration
Coordinating multiple steps, tools, and (sometimes) sub-agents toward the goal. The runtime layer above the LLM. See orchestration explained.
Planner-executor-evaluator
A common multi-component pattern: a planner produces steps, an executor runs them, an evaluator checks the result. Useful when the task has clear completion criteria.
ReAct
Reasoning + Acting. A pattern that interleaves reasoning steps with tool calls (Yao et al., ReAct, 2022). The default execution pattern for most agent frameworks in 2026.
Multi-agent
An architecture where multiple specialised agents coordinate to complete a task. Useful for tasks with parallelisable subgoals; expensive to coordinate. Often confused with single-agent systems with multiple tools.
Single-agent
One agent with access to tools. The default. Multi-agent helps only when subgoals genuinely parallelise; for most business tasks, single-agent is correct.
Agentic RAG
Retrieval-augmented generation where retrieval is exposed as a tool the agent calls when needed, rather than a fixed pipeline step. Improves precision; lets the agent skip retrieval on questions that do not need it.
Function-calling loop
The runtime cycle: LLM emits a tool call, runtime executes it, runtime returns the result, LLM emits the next call or final answer. Repeated until the agent emits a "done" signal or hits a step limit.
Evaluation terms
GAIA
General AI Assistants benchmark. 466 questions across three difficulty levels. Reports human pass rates above 90 percent and top agents below 50 percent on Level 3 (Mialon et al., 2023).
SWE-bench
Software engineering benchmark using real GitHub issues. Measures whether an agent can resolve a real bug. The leaderboard is the public reference for code-agent capability (swebench.com).
AgentBench
Cross-environment benchmark covering web, code, game, and household tasks (Liu et al., 2023). Useful for comparing across task types, not just code.
Reliability
The probability that an agent completes a task correctly under a defined input distribution and tool environment. The 80-test methodology operationalises this for Gravity. See 80-test methodology.
Safety and governance
Refusal correctness
The agent refuses tasks it should refuse and does not refuse tasks it should not. Refusal failures in either direction are common. One of the eight categories in the 80-test methodology.
Hostile input
Prompt injection, jailbreak, or social engineering inside untrusted content the agent reads (emails, web pages, files). Listed in OWASP Top 10 for LLM Applications.
Idempotency
Running the same task twice produces the same effect as running it once. Critical for actions with real-world side effects (payments, emails, writes). One of the eight categories in the 80-test methodology.
Blast radius
The maximum scope of damage from a single agent action. Constrained by tool catalogue, permissions, and circuit breakers. NIST AI RMF treats blast radius as a core risk dimension.
Frequently asked questions
What is an AI agent in simple terms?
An AI agent is a software system that perceives its environment, decides on actions, calls tools to execute those actions, and works toward a goal across multiple steps without per-step human approval. The defining feature is autonomy: choosing the next step based on the goal and current state, not waiting for a human prompt at each junction.
What is the difference between an agent and an LLM?
An LLM is a model that produces text or other tokens in response to input. An agent is a system that uses an LLM (or several) as one component, plus tools, memory, planning, and recovery loops, to achieve a goal. The LLM is a brain; the agent is the body, hands, and calendar around it.
What does agentic mean?
Agentic describes systems that exhibit agency: the capacity to perceive, plan, act, and learn toward goals over time. Anthropic's engineering blog defines agentic systems as those where the LLM dynamically directs its own processes and tool usage. The term is gradient, not binary; systems are more or less agentic on the five autonomy axes.
What is tool use in AI agents?
Tool use is the agent's ability to call external functions or APIs to act on the world. The agent selects the right tool from a catalogue based on the current step, formats arguments, sends the call, and parses the response. Tool use is what makes the difference between an LLM that talks and an agent that acts.
Where can I find longer explanations of these terms?
Each term in this glossary links to a longer explainer in the Gravity blog cluster. The pillar hub is the planned what-is-an-autonomous-ai-agent post; spokes cover memory, tool use, reasoning, orchestration, and the autonomy spectrum. Each spoke includes worked examples and primary sources.
Three takeaways before you close this tab
- Define the terms before pricing the work. Agent vs LLM alone changes scope by 10x.
- Memory is not context. Memory persists; context windows do not.
- Reliability is a number, not an adjective. If a vendor cannot put a number on it, treat the claim as unverified.
Sources
- Anthropic, "Building Effective Agents", retrieved 2026-05-07, anthropic.com/engineering/building-effective-agents
- Mialon et al., "GAIA: A Benchmark for General AI Assistants", arXiv:2311.12983, 2023, retrieved 2026-05-07, arxiv.org/abs/2311.12983
- Yao et al., "ReAct: Synergizing Reasoning and Acting in Language Models", arXiv:2210.03629, 2022, retrieved 2026-05-07, arxiv.org/abs/2210.03629
- Liu et al., "AgentBench: Evaluating LLMs as Agents", arXiv:2308.03688, 2023, retrieved 2026-05-07, arxiv.org/abs/2308.03688
- NIST, "AI Risk Management Framework", retrieved 2026-05-07, nist.gov/itl/ai-risk-management-framework
- OWASP, "Top 10 for LLM Applications", retrieved 2026-05-07, owasp.org