How to Limit What an AI Agent Can Do (Blast Radius Control)

An agent that fails small is a fixable problem. An agent that fails catastrophically is a customer-trust event, a financial loss, or both. The difference is blast-radius control: the set of limits that bound what the agent can do when something goes wrong. The four limits below are the minimum every agent should have. Each handles a different failure mode that the previous limit cannot catch.

The framework comes from operational systems engineering, not from AI specifically. The same reasoning that makes database access controls and rate limits standard for production systems applies to agents, with the addition that agents make decisions the human did not anticipate. NIST's published AI Risk Management Framework treats blast-radius limits as a baseline control (NIST, "AI RMF 1.0").

What blast radius means

Blast radius is the size of the worst-case impact when something goes wrong. The model is borrowed from explosives engineering: how far the damage extends from the origin point. For agents, blast radius is measured along three axes: privacy (what can the agent expose?), state (what can the agent change?), and money (what can the agent spend or move?).

A read-only agent has small blast radius (mostly privacy). A send-email agent has larger blast radius (reputation and customer trust). A database-write agent has larger still (data integrity). A money-moving agent has the largest (financial loss). Limits should match the blast radius. The cost framework in economics of bootstrapped AI agents applies here: the cost of the limits is small; the cost of skipping them is whatever the worst case turns out to be.

Limit 1: Action allowlist

The action allowlist names every action the agent is permitted to take. Actions not on the list are refused at the integration layer, before the agent's reasoning even reaches the tool. The allowlist is small, explicit, and human-readable.

For an inbox triage agent, the allowlist might be: read messages, add labels from a fixed set of five labels, archive messages, save drafts. That is the entire list. The agent cannot send, cannot delete, cannot forward, cannot modify settings. Each addition to the list is a deliberate decision.

Why an allowlist instead of a blocklist? Allowlists fail safe. New actions added to the platform are blocked by default. New tool versions with new capabilities are blocked by default. Blocklists fail unsafe: anything not explicitly blocked is permitted, which means platform updates can silently expand the agent's capabilities. The tool-use model covers this in more depth.

Limit 2: Rate limit per action class

Rate limits cap actions per minute and per day. The most useful limit category is "actions of the same class." An agent that runs label-add 10,000 times in a minute is in a loop. The rate limit catches this before the loop completes.

For most personal agents, a rate of one action per minute and ten actions per hour is generous and catches loops. For business-grade agents, the rate matches the actual work volume: a triage agent processing inbox volume of 100 messages per day can have a rate of 200 per day (2x headroom) without any legitimate run hitting the limit.

The action allowlist filters first; the reversibility check filters last. Each layer catches different failures.

Limit 3: Maximum cost per task

Token costs are the operational expense of agents. An agent that loops or that recursively expands a problem can run up unexpected costs. The cost cap stops this at a known ceiling.

For a recurring agent, set the cap at 3-5x the expected cost of a normal run. For a one-off task, set the cap at the maximum amount you would pay for the task to be done by a human. The reasoning is straightforward: if the agent is going to cost more than a human, you would rather it stop and ask. The cost framework covered in AI agent cost models details how to estimate the normal-run cost.

Cost caps interact with rate limits. A high rate limit with a low cost cap means the agent stops on cost, not rate. A low rate limit with a high cost cap means the agent stops on rate. Set both, and the agent stops on whichever bites first.

Limit 4: Reversibility check before destructive actions

Some actions are reversible: rename a file, change a label, move an item. Some are compensable: cancel an order, void a payment, retract a message (with limited window). Some are irreversible: send an email externally, delete a record from a system that has no undo, transfer money to an external account.

The reversibility check classifies each action before it executes. Reversible actions execute. Compensable actions execute with a recorded compensating action ready. Irreversible actions require explicit confirmation, even when the agent is running unattended for everything else. The reversibility framework is detailed in how to roll back an AI agent's action (forthcoming).

The mental model: every irreversible action is a small bet. Bets you can win and lose freely. Bets you cannot afford to lose require a check.

How limits compose

The four limits compose as a defence-in-depth stack. The action allowlist is the outermost layer: actions not on the list are refused before the agent's reasoning runs. The rate limit is next: even allowed actions stop after the configured per-minute or per-day cap. The cost cap is next: even rate-limited actions stop when the per-task budget is exhausted. The reversibility check is innermost: irreversible actions still require confirmation.

Each layer catches what the previous layer missed. The allowlist catches the agent attempting an unintended action class. The rate limit catches loops within an allowed action class. The cost cap catches token-cost runaway within a rate-limited window. The reversibility check catches the long-tail case where all the previous limits passed but the action is one you cannot afford to undo.

The 8 categories of AI agent failure modes map cleanly onto these layers. Refusal correctness lives in the allowlist. Resource exhaustion lives in the rate limit and cost cap. Destructive output lives in the reversibility check.

Frequently asked questions

What is blast radius for an AI agent?

Blast radius is the size of the worst-case impact when the agent gets something wrong. Read-only agents have small blast radius (mostly privacy). Send-email agents have larger blast radius (reputation, customer trust). Database-write agents have larger still (data integrity). Money-moving agents have the largest (financial loss). Limits should match the blast radius.

What are the four limits every AI agent needs?

Action allowlist (only listed actions are permitted), rate limit per action class (caps actions per minute or per day), maximum cost per task (caps spend per invocation), and reversibility check before destructive actions (the agent confirms an action is reversible or refuses). Each limit handles a different failure mode.

Should I rate limit a personal AI agent?

Yes. A rate limit of one action per minute and ten actions per hour is generous for most personal automation and catches the failure mode where the agent enters a loop and tries to run the same action thousands of times. The cost of the limit is low; the cost of an unlimited agent in a loop is high.

What is a reversibility check?

Before executing an action, the agent (or the integration layer) classifies it as reversible, compensable, or irreversible. Reversible actions execute. Compensable actions execute with a recorded compensating action ready. Irreversible actions require explicit human confirmation, even when the agent is otherwise running unattended.

How do agent limits compose?

The action allowlist filters first: only listed actions are permitted. The rate limit filters next: even allowed actions stop after N per minute. The cost cap filters next: even rate-limited actions stop when the per-task budget is exhausted. The reversibility check filters last: irreversible actions still require confirmation. Each layer catches what the previous layer missed.

Three takeaways before you close this tab

Blast radius scales with action type. Read < write < send < money. Limits should match.
Allowlist + rate + cost + reversibility. Four layers; each catches what the previous missed.
Limits cost almost nothing; their absence costs whatever the worst case is.

Sources

NIST, "AI Risk Management Framework 1.0", 2023, retrieved 2026-05-07, nist.gov/itl/ai-risk-management-framework
OWASP, "Top 10 for Large Language Model Applications", 2024, retrieved 2026-05-07, owasp.org/www-project-top-10-for-large-language-model-applications
Anthropic, "Building Effective Agents", retrieved 2026-05-07, anthropic.com/engineering/building-effective-agents
Aryan Agarwal, "Gravity blast-radius spec", internal v1, May 2026, About