Glossary

AI agent terms, in plain English.

Every term in this glossary is a concept you'll encounter when working with AI agents in 2026. Definitions are short, opinionated where useful, and written by humans, not generated. Link directly to any term by anchor.

Jump to A B C D E F G H I L M N O P Q R S T V W

A

Agent

An AI system that takes a goal in natural language and figures out how to accomplish it on its own. Unlike a chatbot, which answers one question at a time, an agent loops: it plans, picks a tool, runs it, reads the result, then decides what to do next. The defining feature is autonomy across multiple steps.

Agent orchestration

The control layer that decides which agent runs, in what order, with which inputs, and how their outputs combine. Orchestration is what turns a pile of single-purpose agents into something that can finish a real job. Think of it as the conductor; the agents are the section players.

Agentic AI

AI that acts on a goal rather than answering a prompt. The word agentic is doing real work here: it means the system initiates, decides, and adapts. A chatbot is reactive; agentic AI is proactive within a defined scope. Most production agentic systems in 2026 are narrow, not general.

Agentic RAG

RAG where the agent decides what to retrieve, when to retrieve it, and whether the retrieval was good enough. Classic RAG runs one search before the answer. Agentic RAG can run zero, one, or fifteen searches, reformulate queries, follow citations, and abandon a dead end before responding.

AI agent marketplace

A platform where independent builders publish AI agents and users run them on demand. The marketplace handles discovery, execution, billing, and quality control. The category exists because most people don't want to build agents; they want results. Gravity is the AI agent marketplace.

Autonomous agent

An agent that runs without a human pressing go for every step. Autonomy lives on a spectrum: fully supervised, partially supervised, scheduled, fully autonomous. In production, fully autonomous is rare; most useful agents are partially supervised with hard guardrails and a kill switch.

B

Benchmark

A standardized test for comparing AI agents on the same task set. The benchmarks that matter in 2026: GAIA (general agent ability across reasoning, tool use, and multimodal), SWE-bench (software engineering on real GitHub issues), AgentBench (multi-domain operating environments), BFCL (Berkeley Function Calling Leaderboard, structured tool use), and ToolBench (large-scale API use). Treat benchmark scores as rough signal, not ground truth, real-world reliability is what ships.

C

Capability

A discrete thing an agent can do well: send an email, query a database, summarize a PDF. Agents are built by composing capabilities. On a marketplace, capability is the unit of quality: each one is tested in isolation before the agent ships. Capability counts are meaningless without quality scores.

Chain-of-thought

Asking a model to write out its reasoning step by step before producing an answer. Improves accuracy on multi-step problems by about 15 to 30 percent on most benchmarks. The newer reasoning models do this internally without being asked. Visible chain-of-thought is also useful for debugging agent failures.

Compound intelligence

The capability gain you get when multiple specialized agents coordinate, rather than one big agent doing everything. A planner, a researcher, and a writer working together often beat a single agent prompted to do all three. The term is loose: use it when you mean coordinated specialization with shared state.

Context window

The maximum amount of text a model can read in one shot, measured in tokens. In 2026, most frontier models handle 200K to 2M tokens. Bigger isn't always better: agents that stuff the window with junk produce worse output than agents that retrieve narrowly. Context is a budget, not a bag.

Credit

The unit of metered usage on Gravity. One US dollar equals 1,000 credits. A run costs as many credits as the work it does, no subscription, no minimum. Credits never expire while your account is active. The model is built so a small task costs cents and a large task is still predictable up front.

D

Distribution

How users find and choose your agent. On a marketplace, distribution is search ranking, recommendations, and category placement. On Gravity, distribution is decided by quality score alone. No paid placement, no boosted listings, ever. Builders compete on results, not budget.

Deployment

Putting an agent into production where real users can run it on real data. Deployment is the moment the agent stops being a prototype and starts owing the user reliability. On Gravity, deployment of a matched agent takes 60 seconds because the agent is already vetted and hot; only your config is new.

E

Embedding

A list of numbers that represents a piece of text in a way machines can compare. Two embeddings that are close in vector space mean two texts that are close in meaning. Embeddings power semantic search, retrieval, deduplication, and most RAG pipelines. They're the math underneath find me something like this.

Evaluator

A system that scores agent output for quality, safety, or task completion. Can be a rule check, a smaller model, a human review, or a frontier model judging a weaker one. Evaluators are how you tell a good agent from a confident one. Without an evaluator, you're shipping vibes.

F

Failure mode

A specific way an agent breaks. The common ones in 2026: hallucination, infinite loops, tool selection errors, premature stopping, refusal to use a tool, and silent partial completion. Every production agent should have a named list of its known failure modes and a tested response for each.

Function calling

The model API feature that lets a language model invoke a developer-defined function with structured arguments. Function calling is how an LLM goes from talking about an action to taking one. It is the substrate every agent framework sits on. Get this right and your agent can use tools; get it wrong and nothing works.

Fine-tuning

Continuing the training of a base model on a smaller, curated dataset to specialize it. Useful when prompting alone can't get the behavior you need: tone, format, niche domain. In 2026 most teams reach for retrieval first, prompts second, fine-tuning third, because fine-tuning is expensive to maintain.

G

Guardrails

Hard constraints that prevent an agent from doing damaging things even if the model wants to. Guardrails are not prompts; they're code that intercepts and blocks. Examples: spending caps, write-permission scopes, content filters, allowed-domain lists. A prompt asks nicely. A guardrail says no.

Grounding

Tying an agent's output to a verifiable source. A grounded answer points at the document, row, or URL that backs it. Grounding is the first defense against hallucination and the only way to make agents auditable. If your agent can't cite, it's guessing.

H

Hallucination

When a model outputs something confidently false. In agents, hallucinations are especially dangerous because the agent might act on the lie: send the wrong email, call the wrong API, cite a paper that doesn't exist. Grounding, evaluators, and refusal policies are the standard countermeasures.

Handoff

When one agent passes control of a task to another agent in a multi-agent system. A clean handoff transfers the goal, the relevant context, and the constraints, not the entire conversation history. Bad handoffs are where most multi-agent systems lose accuracy. Treat the interface between agents like an API contract.

Human-in-the-loop

A workflow where the agent does most of the work but pauses for human approval at chosen points. Human-in-the-loop is the realistic mode for high-stakes tasks: sending money, signing contracts, posting in your voice. The design question is which steps to gate, not whether to gate any.

I

Inference cost

What it costs to run a model once, usually billed per million input and output tokens. For agents, inference cost adds up fast because every step is a call. A ten-step agent on a frontier model can cost 30x what the user thinks they're paying for. Cost-aware orchestration is now a core skill.

Internal linking

Linking from one page on your site to another. In SEO, internal linking signals topic relationships to search engines and distributes ranking weight. A glossary like this one earns its keep by linking out to deep-dive articles, and getting linked back from them, building a cluster around AI agent terms.

L

LLM

Large language model: a neural network trained on enormous text corpora to predict the next token. LLMs are the engine inside almost every agent in 2026, but an LLM alone is not an agent. The agent is what wraps the model with tools, memory, a loop, and a goal.

LangChain

An open-source framework for composing LLM applications and agents. LangChain provides abstractions for chains, tools, memory, and agents. It's widely used as a prototyping layer; many production teams strip back to lighter primitives once they know what they need. Pick it for speed, replace it for reliability.

M

Memory

What an agent remembers across runs. Short-term memory is the current conversation. Long-term memory is what the agent has learned about you: your preferences, your contacts, your writing voice. Useful agents need both. Memory without forgetting becomes noise; forgetting without memory means starting over every time.

Multi-agent system

Two or more agents that coordinate to finish a task. Common patterns: planner plus workers, debater pairs, supervisor plus specialists. Multi-agent systems win when the subtasks are genuinely different. They lose when the coordination overhead exceeds the specialization benefit, which is more often than the demos suggest.

Marketplace ranking

How a marketplace orders the agents shown to a user. On Gravity, ranking is quality-only: pass rate on the 80+ tests per capability, real-world run success, refund rate, and freshness. No paid placement, no boosted listings, ever. A builder's only path to the top is shipping a better agent.

MCP

Model Context Protocol: an open standard for connecting AI assistants to external tools and data sources. MCP standardizes the function-calling interface so the same tool can plug into many agents without custom glue. In 2026 it's becoming the default way to expose a service to AI clients.

N

Natural language interface

An interface where you type or speak in normal English instead of clicking through menus. The interface for AI agents is converging on natural language because that's the format goals naturally take. Good natural language interfaces accept ambiguity and ask one clarifying question, not five.

O

Orchestration

The logic that runs the whole show: which step happens next, what to retry, when to ask the human, when to give up. Orchestration is what separates a demo from a product. Most agent failures in 2026 are orchestration failures, not model failures. The model was fine; the loop was wrong.

Observability

The instrumentation that lets you see what an agent did and why. Production agents need step-level logs, tool-call traces, prompt and response capture, token counts, and outcome tags. Without observability, debugging an agent is guessing. With it, you can answer why did this run fail without rerunning it.

P

Pillar page

A long, comprehensive page that anchors a topic cluster on your site. The pillar covers the topic broadly; the spokes go deep on subtopics and link back. This glossary functions as a definitional pillar for AI agent terminology. The structure helps search engines understand what your site is the authority on.

Prompt engineering

The craft of writing instructions that get reliable behavior out of a model. Less glamorous than the title suggests and mostly empirical: write, run, look at outputs, revise. In agent systems, prompts live everywhere, the system prompt, the tool descriptions, the evaluator rubric, and tiny wording changes can move outcomes a lot.

Pricing

How an AI agent costs money. Two dominant models in 2026: per-run (pay only when the agent works) and subscription (pay monthly regardless). Per-run aligns incentives because the builder only earns when they deliver. Subscription smooths revenue but punishes light users. Gravity is per-run, no subscription.

Q

Quality score

A composite metric that summarizes how good an agent actually is in production. On Gravity, the quality score blends test pass rate, run success, latency, refund rate, and user re-runs. It's the only thing that decides ranking. A high quality score is the marketplace's version of a track record.

R

RAG

Retrieval-augmented generation: fetch relevant documents, paste them into the prompt, then let the model answer. RAG is how you give a model knowledge it wasn't trained on, like your company wiki or yesterday's news. The bottleneck is almost always retrieval quality, not the model. Garbage in, confident garbage out.

Reasoning model

A model trained or tuned to think longer before answering, using extended internal chain-of-thought. Reasoning models trade latency and cost for accuracy on hard problems: math, code, planning. For agents, the right pattern is often a reasoning model for the planner and a fast model for the workers.

Recommendation engine

The system that suggests which agents a user should try based on intent, history, and quality signals. A good recommendation engine on an agent marketplace replaces the search box: you don't browse, you type a goal and the right agent surfaces. Most users never want to scroll a catalog. They want one good answer.

Refusal policy

The set of things an agent will not do, even if asked. Refusal policy covers safety (no malicious tasks), scope (no actions outside the agent's purpose), and authority (no spending above a cap, no destructive writes without confirmation). Good refusal policies are specific and tested; vague ones get jailbroken.

S

Stopping condition

The rule that tells an agent it's done. Stopping conditions look obvious until you write them: goal reached, step limit hit, budget exhausted, repeated failures, human override. Most agent loops that go off the rails have a missing or wrong stopping condition. Always set a hard ceiling.

System prompt

The standing instructions an agent receives at the start of every run, before any user input. The system prompt defines role, tone, scope, constraints, and the tools the agent can use. It's the most leveraged few hundred words in the whole stack. Change a system prompt and you change the agent's personality.

T

Tool use

An agent calling external functions, APIs, browsers, databases, or shells to get things done. Tool use is what makes an agent useful in the world; without it, the model can only talk. The hard problem isn't calling tools, it's picking the right one, with the right arguments, at the right time, and handling the response.

Token

The unit a language model reads and writes. A token is roughly three-quarters of an English word; the word agent is one token, internationalization is several. Tokens are how context windows, billing, and rate limits are measured. When someone says it costs five dollars per million tokens, that's the unit.

Topic cluster

A group of pages on your site that cover one topic from multiple angles, linked together. The pillar page covers the topic broadly; the cluster pages go deep on subtopics. Topic clusters are how modern SEO is structured because they show search engines you have real depth, not just one hot page.

Trust model

The explicit rules about what an agent is allowed to do in your accounts, with whose approval, and with what visibility. Good trust models default to least privilege, log every action, and require explicit consent for irreversible operations. On a marketplace, the trust model is part of the product, not an afterthought.

V

Vector database

A database optimized for storing embeddings and finding the ones nearest to a query. Vector databases power semantic search and most RAG pipelines. Common options in 2026 include pgvector, Pinecone, Weaviate, Qdrant, and the built-in vector features of Postgres and SQLite. The default answer is now pgvector unless you have a reason.

W

Workflow automation

Tools where you build the automation step by step: trigger, action, branch, retry. Zapier, Make, and n8n are the classics. The difference from an agent is that workflow tools require you to design the logic; an agent figures the logic out from a goal. Workflows are explicit. Agents are inferential.

Agent vs chatbot

The distinction that confuses everyone. A chatbot replies to one message at a time inside a conversation. An agent takes a goal and acts across multiple steps, tools, and sessions to accomplish it. Every agent contains a chatbot. Not every chatbot is an agent. If it can't use tools, it's a chatbot.