AI Agent Reflection and Self-Correction Explained

Agent reflection is the process by which an AI agent evaluates its own output, identifies problems, and revises the result before returning it. It is how agents catch their own mistakes without requiring a human in the loop for every task.

This post explains how reflection works at a technical level: the structure of the reflect-critique-revise loop, where it fits in relation to planning and tool use, the latency and cost trade-offs, and how builders incorporate it into production agents. If you are evaluating agents or building them, this is one of the core reliability mechanisms to understand.

What Is Agent Reflection?

An AI agent without reflection produces an output and returns it. That output may be correct; it may also contain errors, omissions, format violations, or factual mistakes. A single forward pass through the model does not guarantee correctness, and the agent has no way to know whether the output meets the task requirements unless it checks.

Reflection adds a checking step. After generating an initial output, the agent evaluates that output against the task requirements: did it answer the right question, follow the required format, avoid known error types, cite sources where required, and complete all required steps? If the check finds problems, the agent produces a revised output. If the output passes the check, it is returned.

The term "self-correction" is often used interchangeably. Both describe the same mechanism: the agent assessing its own work and revising it before delivery. For a broader grounding in how agents work, the what is an AI agent guide covers the foundational concepts.

Reflection is not just reprompting

Simply asking the model to "check your work" at the end of a prompt is a loose form of reflection, but structured agent reflection is more precise. The critique step applies defined evaluation criteria, not a vague instruction to recheck. The criteria might be a rubric, a format specification, a list of required fields, a set of factual constraints, or a policy. The output of the critique is a structured assessment, not another freeform answer. That structure is what makes the revision meaningful rather than just a second attempt.

The Reflect-Critique-Revise Loop

The loop has three stages, and they can run once or iterate.

Stage 1: Generate

The agent produces an initial candidate output for the task. This is the standard forward pass: the agent reasons about the input and produces a response. No reflection has happened yet. The output might be a research summary, a code function, a drafted email, a classification decision, or a structured data extraction.

Stage 2: Critique

The agent (or a separate critic component, sometimes a different model or a different prompt context) evaluates the candidate output against defined criteria. The critique can check for:

Correctness: does the answer match the facts in the source material?
Completeness: did the agent address all required fields or questions?
Format compliance: does the output follow the required structure, length, or schema?
Instruction adherence: did the agent follow all constraints in the task specification?
Internal consistency: are there contradictions between different parts of the output?
Known failure modes: does the output exhibit patterns the builder has flagged as problematic for this task type?

The critique produces a structured result: either "output passes" or "output fails, here is why." A useful critique is specific about what failed and why, because that specificity is what the revision step uses to produce a better output.

Stage 3: Revise

If the critique found failures, the agent revises the output using the critique as explicit context. The revision is not a fresh attempt from scratch; it incorporates the specific problems the critique identified. A well-designed revision step produces a better output because it is addressing a concrete list of identified problems, not just trying again.

After revision, the loop can terminate or repeat. Most implementations set a maximum iteration count (typically two to four passes) to bound the cost and latency. If the output still fails after the maximum iterations, the agent can return the best available output with a flag indicating it did not fully pass the self-check, or it can escalate to a human reviewer.

Concrete Examples

Abstract descriptions of reflection loops are easier to evaluate with concrete examples. Here are three task types where reflection produces a measurable improvement.

Code generation with test verification

An agent tasked with writing a Python function generates the function, then runs its own unit tests against it. If any test fails, the agent reads the failure message, identifies the bug, and revises the code. This is reflection with tool use: the execution environment provides the critique signal rather than a language model critique prompt. The agent does not need to reason about whether the code is correct in the abstract; it runs the code and reads the result. The revision step targets the specific failure the test returned.

Research summary with source grounding

An agent asked to summarize research on a topic generates a draft summary, then checks each factual claim against the source documents it retrieved. For each claim it cannot ground in a source, the critique step flags it as unverified. The revision step removes or qualifies the unverified claims. The output the user receives contains only claims the agent could verify in its retrieved sources, without any human review step.

Structured data extraction with schema validation

An agent extracting fields from an unstructured document generates a JSON object with the extracted values, then validates that object against the required schema. If required fields are missing or field values are in the wrong format, the critique identifies the specific violations. The revision step re-reads the document focusing on the missing fields and produces a corrected extraction. The output always conforms to the schema.

Why Reflection Improves Reliability

Single-pass language model outputs have a characteristic failure pattern: the model is confident whether or not the output is correct. It does not report uncertainty proportional to actual error rate. Reflection breaks this pattern by making the agent's confidence conditional on a separate checking step.

The key reliability gains come from two mechanisms. First, errors that are invisible in the generation context become visible in the evaluation context. A critique prompt that asks "does this output contain all required fields" will catch a missing field that the generation step overlooked because the field was not salient during generation. Second, the revision step has the critique's specific findings as context, which means it is solving a narrower, better-specified problem than the original generation. Narrower problems produce fewer errors.

This connects to the broader challenge of why agents need to be designed for reliability rather than just capability. A chatbot that produces a plausible but incorrect answer is acceptable when a human is reading and correcting in real time. An autonomous agent acting on its output without human review needs to be right more often, and reflection is one of the primary techniques for achieving that.

Failure mode reduction

Reflection is particularly effective against predictable failure modes: the specific error patterns a given agent type tends to produce for a given task type. A builder who knows that a research agent tends to include unverified statistics can write a critique step that specifically checks for unverified statistics. That targeted check catches the failure mode much more reliably than a generic "check your work" instruction. Over time, builders accumulate a library of failure modes for their agent types and encode them as explicit critique criteria.

Trade-offs: Latency and Cost

Reflection is not free. Each critique pass and each revision requires additional model inference. For a single reflection iteration, the total token cost roughly doubles. For three iterations, it can triple or quadruple. Latency increases proportionally. These costs are real and need to be weighed against the reliability improvement for each task type.

When the trade-off is worth it

The trade-off favors reflection when the cost of an incorrect output is high: a code function that will be deployed, a legal document that will be sent to a client, a factual claim that will be published, a data extraction that will feed a database. In these cases, additional inference cost is small relative to the cost of catching an error before it reaches production.

The trade-off is less favorable for latency-sensitive tasks where a user is waiting for a real-time response, or for high-volume tasks where each run's cost matters at scale. In these cases, builders apply reflection selectively: only on high-stakes output steps rather than every step, or with a single reflection pass rather than multiple iterations.

Techniques for managing reflection cost

Several design patterns reduce the cost of reflection without eliminating the reliability benefit. Using a smaller, faster model for the critique step and reserving the larger model for revision is one approach. Applying reflection only to steps whose output feeds into downstream agent actions is another; intermediate steps that do not directly affect the final output may not need reflection. Caching critique criteria so they do not need to be computed fresh for every run also reduces overhead at scale.

Reflection Versus Planning

Reflection and planning are often confused because both involve the agent reasoning about its own process. The distinction is about timing.

Planning happens before execution. The agent reads the task, reasons about what steps are needed, and produces a plan before taking any action. Good planning reduces the chance of taking wrong steps in the first place. The agent planning versus execution guide covers this in detail.

Reflection happens after a step produces output. It evaluates what was produced and decides whether it is good enough. Reflection catches errors that planning could not prevent: cases where the plan was correct but the execution of a specific step produced a flawed output.

Both are components of a reliable agent, and they operate at different points in the loop. A well-designed agent plans before acting and reflects after each substantive output step. The combination produces substantially better reliability than either mechanism alone.

Reflection and Tool Use

Reflection interacts with tool use in important ways. When an agent has access to tools (search, code execution, databases, APIs), those tools can serve as the critique signal rather than requiring a language model critique step.

An agent that can run code does not need to reason about whether its code is correct; it runs it and checks the output. An agent that can query a database can verify whether the IDs it extracted from a document actually exist in the database. An agent with access to a search tool can check whether a claimed fact appears in retrieved sources. In all these cases, the tool provides an objective verification signal that is more reliable than a language model assessing its own output.

This connection between reflection and tool use is one reason why agent tool use matters so much for reliability, not just capability. Tools do not just extend what the agent can do; they provide external ground truth that makes reflection more objective. For context on how these components fit together in a complete agent architecture, the agent memory guide covers how agents retain context across reflection passes and multi-step workflows.

How Gravity Agents Use Self-Checking

On Gravity, reflection is implemented by the expert builders who design each agent, not by users. When you describe a task and an agent runs it, the self-checking behavior is already embedded in the agent's design. You receive verified output rather than first-draft output.

A research agent on Gravity checks its claims against retrieved sources before returning the summary. A document extraction agent validates its output against the required schema. A drafting agent checks its output against specified constraints (tone, length, required elements) before delivering the draft. The user does not configure any of this; the builder designed it and Gravity tested it before the agent went live.

This is one of the practical reasons expert-built agents consistently outperform prompt-and-go approaches: the builder has encoded the specific failure modes for that task type into the critique step, and iterated on the reflection design based on testing against real task instances. Users benefit from that accumulated reliability work without needing to understand the implementation. The multi-step agent workflow guide shows how these individual reflection steps compose into larger task pipelines.

Building Effective Reflection Into Agents

For builders or technical buyers designing their own agents, several principles make reflection more effective in practice.

Define critique criteria explicitly

Vague critique prompts ("check if this is good") produce vague critiques that do not drive reliable revisions. Specific criteria ("check that all five required fields are present, that every number in the summary appears verbatim in the source documents, and that the output is under 300 words") produce specific critiques that drive targeted revisions. Time spent defining precise criteria is recovered in reliability.

Separate the critique context from the generation context

The critique is most effective when it runs in a fresh context that does not carry the assumptions and framing from the generation step. The generation step may have taken shortcuts or made assumptions that the critique should catch; if the critique runs in the same context, it is likely to inherit those same assumptions. A separate prompt or a separate model call for the critique provides more independent evaluation.

Use tools for objective verification wherever possible

Language model self-evaluation has limits. A model can fail to notice its own errors, especially errors of omission. Where tools can provide objective verification (code execution, schema validation, database lookup, source retrieval and comparison), use them as the critique signal. Reserve language model critique for aspects of the output that cannot be checked by a tool: tone, appropriateness, logical coherence, and qualitative fit to requirements.

Set iteration bounds and handle failure gracefully

Unbounded reflection loops can become expensive and slow. Set a maximum iteration count appropriate for the task's cost-sensitivity and latency requirements. Define what the agent should do when the maximum is reached without a passing output: return the best available output with a flag, escalate to human review, or fail with a clear error. Silent delivery of an output that failed its own self-check is worse than explicit escalation.

For a broader view of how reflection fits into agent architecture, the Gravity glossary covers agent design terms including reflection, planning, memory, and tool use with consistent definitions.

Frequently Asked Questions

What is reflection in an AI agent?

Reflection is the process by which an AI agent evaluates its own output against a set of criteria before returning that output to the user or passing it to the next step. The agent generates a candidate answer, then runs a separate critique step that checks for errors, gaps, or failures to follow instructions, and produces a revised version if the critique finds problems.

How does the reflect-critique-revise loop work?

The loop has three stages. First, the agent generates an initial output. Second, the agent (or a separate critic component) evaluates that output against defined criteria: correctness, completeness, format compliance, factual accuracy, or task requirements. Third, if the critique identifies failures, the agent produces a revised output. The loop can run once or repeat until the output meets the defined threshold or a maximum iteration count is reached.

What is the difference between agent reflection and agent planning?

Planning happens before execution: the agent reasons about what steps to take. Reflection happens after a step produces output: the agent evaluates the output and decides whether to accept it or revise it. Both improve reliability but at different points in the agent loop. Planning reduces the chance of taking wrong steps; reflection catches errors in the output of steps that were taken.

Does reflection add cost or latency to an agent?

Yes. Each reflection pass requires additional model inference, which adds both time and cost. For tasks where output quality matters more than response speed (legal drafts, code generation, factual research), the tradeoff is usually worth it. For latency-sensitive tasks, designers limit the number of reflection passes or apply reflection only to high-stakes output steps rather than every step.

How does Gravity use reflection in its agents?

Gravity's expert-built agents include self-checking steps calibrated to each task type. A research agent verifies claims before assembling the final report. A document agent checks format compliance before returning the output. Users do not configure this; the builder designs the reflection logic and Gravity tests it before the agent goes live. The result is that users receive verified output rather than first-draft output.