How to Set a Spending Cap on an AI Agent

An AI agent without a spending cap is an open tab on a model provider. Most of the time the bill is small. The expensive day is the one where the agent loops on a malformed input, or chains a search tool with itself a thousand times, or quietly retries on a transient error until a credit-card alert wakes someone up. Spending caps are the safety net that turns "expensive day" into "agent paused, owner notified, cost contained".

This guide covers three caps and one rule. The caps are per-run, per-day, per-month. The rule is hard stop on the small caps, alert on the big one. Together they prevent the runaway scenarios that account for most "agent went wrong" cost incidents.

Three caps, not one

Why three? Because the three failure shapes are different and a single cap cannot catch all of them.

Per-run loops. One bad input, the agent spirals, tens of dollars in tokens. Per-run cap stops it inside a single invocation.
Per-day cascades. Several bad inputs in a row, or a single mistake that triggers many retries. Per-day cap stops the cascade before morning.
Per-month drift. The agent quietly handles more inputs than expected, or a model price increase, or a feature that grew traffic. Per-month cap surfaces drift.

A single cap at any one of these levels misses the others. Per-run only would miss a thousand normal-cost runs that should not have happened. Per-month only would miss a $500 spike at 2 a.m. on a Tuesday.

The per-run cap

The per-run cap is the most important. It is the only cap that prevents a single runaway invocation from doing real damage.

To size it: run the agent ten times in dry-run mode on representative inputs. Record cost per run. Take the maximum. Multiply by three. The 3x multiplier leaves headroom for an oversized but legitimate input while still stopping a loop. Re-tune quarterly.

If your agent platform expresses cost in tokens rather than dollars, do the same exercise in tokens. The principle is identical: enough headroom for a real big input, not enough headroom for a loop.

For the longer cost-modelling discussion, see how to estimate agent cost before deploying and AI agent cost models explained.

The per-day cap

The per-day cap catches the chained-mistake scenario. The agent is fine for the first ten runs and then a transient API error triggers retries on every subsequent run, and the cost compounds. Per-day cap fires somewhere mid-cascade and pauses the agent.

Size: 3x typical daily volume. If the agent normally runs 100 times a day at $0.10 each, the typical day is $10 and the cap is $30. That leaves room for a busy day or a one-off spike, and it does not leave room for an open-ended loop.

The per-day cap should be a hard stop. When it fires, the agent halts. The owner reviews. No quiet "soft cap" that warns and continues, because the most expensive bug is the one that decided the warning was fine.

The per-month cap

The per-month cap is the budget signal. It does not stop the agent; it tells the owner that monthly spend is on track to exceed budget. Set it at expected spend + 25%, and have it alert the owner the day it trips.

Per-month is alert-only because a hard stop here would kill production over a slow drift that is probably caused by usage growth or a price change. Both are decisions to be made by a human, not the cap. The alert is what gets the human in the room.

Three caps catch three different failure shapes. Hard stop on the small ones. Alert on the big one.

Hard stop vs alert

The principle: caps that protect against unbounded cost are hard stops. Caps that surface drift are alerts. Mixing the two creates the worst case: a hard cap so high it never fires, or an alert so easy to dismiss that the actual incident slides past.

Hard stop means: agent terminates the current run, refuses subsequent runs until the owner unblocks. Alert means: the owner gets a notification with the current spend and a link to review. Both are necessary; neither replaces the other.

Some platforms offer "graceful degradation" where the agent falls back to a smaller model or a shorter prompt at the cap. Disable it. A degraded agent is a worse agent than no agent, and the silent fallback obscures the cost incident the cap exists to flag.

What to do when a cap fires

The cap fires. Now what?

Read the trace. Was this a real input that grew unexpectedly, or a loop, or a tool call that retried? Monitoring tooling shows the shape immediately.
Decide: fix or accept. If the input grew legitimately, raise the cap. If the agent looped, fix the prompt or the tool description.
Resume only after the change is staged. Don't unblock the agent and hope the same input does not arrive again.
Log the incident. Three of these in a quarter is a signal that the agent or its tools need a deeper review.

For the broader incident response framing, see AI agent failure modes.

Token caps vs spending caps in detail

Token caps and spending caps are related and not redundant. A token cap on the model call ("max 8000 output tokens") prevents a single response from blowing through the context budget. A spending cap on the run prevents an agent from chaining many cheap calls into one expensive run.

Concrete: an agent with a 4000-token model cap can still spend $20 on a single run by calling the model fifty times in a loop. The token cap holds; the run goes runaway. Spending cap is the catcher.

In the other direction: a generous spending cap of $5 per run will not catch a single 100,000-token output that costs $4. The spending cap is fine but the response is unwieldy, hard to inspect, slow to render. Token cap is the catcher.

Set both. Token caps prevent oversized contexts. Spending caps prevent runaway loops and expensive tool combinations. The gap between them is where most cost incidents live, and closing the gap is cheaper than discovering it on a Tuesday morning credit-card alert.

Common mistakes

One cap, set high. Misses the in-run runaway.
Soft caps that "warn and continue". The runaway warning is also continued past.
Graceful fallback to a smaller model. Quietly degraded agent producing wrong output is worse than a paused agent.
Cap not re-tuned after model price changes. Provider prices change. Caps should track.
No alert routing on the per-month cap. An alert that goes nowhere is no alert.

Frequently asked questions

What spending caps should I set on an AI agent?

Three caps minimum: per-run cost (stops a single bad input from spending unbounded tokens), per-day cost (catches chained mistakes overnight), and per-month cost (catches slow drift). Set per-run at 3x your typical run cost, per-day at 3x typical daily volume, per-month at expected spend plus 25%.

Should an AI agent stop hard at the cap or just alert me?

Hard stop on per-run, hard stop on per-day, alert on per-month. Per-run and per-day caps are the safety net; if either fires, the agent should refuse to continue until you investigate. Per-month is a budget signal that shouldn't kill production, but it should reach the owner the same hour it fires.

How do I estimate the right per-run cost cap?

Run the agent ten times in dry-run on representative inputs and record the cost of each run. Take the maximum and multiply by three. That cap leaves headroom for an oversized input but stops the agent if it loops or hits a runaway tool call. Re-tune the cap quarterly as the agent evolves.

What happens when an AI agent hits its spending cap?

The current run terminates with a deterministic error and no destructive action is committed. The agent is paused until the owner reviews. Subsequent runs queue or fail with the same error depending on platform. The cap should never quietly degrade the agent into a smaller model or a shorter prompt; degraded behaviour is worse than no behaviour.

Are token caps the same as spending caps?

Related but not the same. Token caps limit context and output length per call. Spending caps limit cost across the run, including tool-call expenses (search APIs, vector lookups, function-execution time). Set both. Token caps prevent oversized contexts; spending caps prevent runaway loops and expensive tool combinations.

Three takeaways before you close this tab

Per-run, per-day, per-month. Three caps, three failure shapes.
Hard stop on the small caps. Alert on the big one.
No graceful degradation. A paused agent beats a quietly-wrong agent.

Sources

OpenAI, "Usage limits and rate limits", retrieved 2026-05-08, platform.openai.com/docs/guides/rate-limits
Anthropic, "Cost and usage tracking", retrieved 2026-05-08, docs.anthropic.com/en/api/admin-api
FinOps Foundation, "FinOps Framework", retrieved 2026-05-08, finops.org/framework
Aryan Agarwal, "Gravity budget defaults", internal v1, May 2026, About