AI Agent Cost Forecasting: Model Spend Early

To forecast AI agent costs before you scale, build a unit-economics model first: estimate what a single task costs as model tokens plus tool calls plus retries, then multiply that cost per task by the run volume you expect over the period. Run the model three ways, best, expected, and worst case, test how sensitive the total is to the biggest driver, and wrap budgets and alerts around it as guardrails. That sequence turns a vague worry into a defensible number and stops a pilot that looked cheap from becoming a surprise at scale.

The mistake worth avoiding is forecasting at the wrong altitude. A monthly bill is not a forecast, it is a result. A forecast starts from the cost of one run and the number of runs you plan to do, because those are the two levers you actually control. Get the unit cost right and the rest is arithmetic.

Cost per task built from model tokens, tool calls, and retries, then multiplied by expected run volume to forecast total spend — Forecasting starts at the unit: cost per task, then volume.

Why forecast before you scale

Agent costs are nonlinear in a way that punishes teams who skip the forecast. A pilot running ten tasks a day for a handful of users feels free. The same agent at ten thousand tasks a day is a budget line, and the cost did not grow ten-fold from where you tested, it grew by the volume ratio times whatever your per-task cost happens to be. If that per-task cost is higher than you assumed, the gap compounds at scale. Forecasting is how you find the gap while it is cheap to fix, before you have committed headcount and customer promises to a number you never modeled.

This is a known discipline, not a novelty. The FinOps Foundation built an entire practice around forecasting and governing variable cloud spend, and agent spend behaves the same way: usage-based, lumpy, and easy to underestimate until the bill arrives. Gartner and other analysts have repeatedly flagged that organizations underestimate the run-cost of AI systems because the pilot is not representative of production load. A forecast is the cheapest insurance against that pattern. It does not have to be precise to the cent; it has to be honest about the order of magnitude and the range.

The cost drivers

Before you can model anything, name what moves the number. Agent cost decomposes cleanly into a per-run figure multiplied by run volume, and the per-run figure has a small number of inputs.

Runs per period. How many times the agent executes in a day, week, or month. This is the multiplier on everything else, so it is usually the dominant driver and the one most likely to surprise you as adoption grows.
Model tokens per run. Every run consumes input and output tokens, priced per token by the model provider. How per-token pricing works is documented in official model-provider pricing pages, such as the Anthropic pricing docs: you pay a rate for input tokens and a higher rate for output tokens, so both the size of the prompt and the length of the response matter.
Context size. The bigger the context you feed each run, retrieved documents, history, instructions, the more input tokens you pay for. A retrieval-heavy agent can cost far more per run than a lean one doing the same task with a tighter context.
Model choice. A larger, more capable model costs more per token than a smaller one. Choosing the right model for the job is the single biggest lever on per-task cost, and it is the subject of ongoing cost optimization work once you are live.
Tool calls. Each external tool or API call the agent makes can carry its own cost and adds tokens for the call and its result. An agent that makes many tool calls per task costs more than one that makes few.
Retries. When a step fails or a result fails validation, the agent retries, and every retry costs again. A high retry rate quietly inflates the per-task figure, which is why it belongs in the model rather than being treated as an edge case.

Build a cost-per-task model

Put the drivers into a simple equation. The cost of one task is the sum of its model token cost, its tool-call cost, and the overhead from retries:

Cost per task = (input tokens + output tokens, priced per token) + (tool calls × cost per call) + (retry rate × the same per-attempt cost)

Estimate each input from a representative sample of real runs, not a single best-case run. Execute the agent on a spread of typical tasks, measure the average tokens, the average tool calls, and the share of runs that retry, then plug those averages in. The output is one number: your cost per task. For a fuller method, including how to gather those measurements cleanly, our guide on estimating agent cost before deploying walks the same path step by step.

With a cost per task in hand, the forecast is multiplication. Take the per-task figure and multiply by expected runs for the period:

Input	Example value	Notes
Cost per task	Modeled from a sample	Tokens + tool calls + retries
Runs per day	Expected volume	The dominant multiplier
Days per period	30 for a month	Match your budget cycle
Forecast spend	Cost per task × runs/day × days	The headline number

This per-task lens is also what makes spend comparable. Once you know the cost of a task, you can benchmark it against alternatives, the subject of cost-per-task benchmarks, and you can attribute spend to the workflows that drive it, which is where cost attribution picks up. The unit model is the foundation everything else stands on.

Scenarios and sensitivity

A single number is a false comfort. Reality lands in a range, so model the range explicitly with three cases. The best case assumes low volume, lean context, and few retries. The expected case uses your honest central estimates. The worst case assumes adoption runs ahead of plan, contexts grow, and retries climb. Budget to the worst plausible case, not the hopeful one, because the cost of being wrong on the high side is a blown budget, while being wrong on the low side just means money left over.

Then run a sensitivity check: vary one input at a time and watch how much the total moves. This tells you which driver you are most exposed to. For most agents it is run volume, since it multiplies everything, followed by context size or retry rate depending on the workflow. Knowing the dominant driver focuses your attention. If volume is the lever, your forecast is really a bet on adoption, and you should track adoption closely. If retries dominate, fixing reliability is the cheapest way to cut the forecast. This is also where forecasting connects to the bigger financial picture: weighing the forecast against the value the agent produces is cost versus ROI, and folding in maintenance, integration, and oversight beyond raw run cost is total cost of ownership. The run-cost forecast is one input to both, not the whole story.

Budgets and alerts as guardrails

A forecast that nobody enforces is a wish. Turn the number into guardrails. Set a budget at, or a little above, your expected case, and configure alerts that fire as spend approaches it, well before it is breached. Add a hard cap for runaway scenarios so a bug or a traffic spike cannot run unbounded. These controls are what convert a planning estimate into operational safety, and they hand off directly to the live disciplines that keep spend in line.

Forecasting sits at the front of a small family of cost practices, and it helps to keep them distinct. Forecasting, this post, is forward-looking: you model spend before you scale. Cost control is the operational job of keeping live spend inside those bounds day to day. Cost anomaly detection is reactive, catching unexpected spend spikes after they start so you can respond fast. The three work together: the forecast sets the expectation, control holds the line, and anomaly detection catches what slips through. Build the forecast first, because the other two need a baseline to measure against, and revisit it as real usage data replaces your initial estimates. A forecast is a living model, not a one-time spreadsheet.

How Gravity handles cost forecasting

Gravity is an AI agent platform with pricing that makes forecasting unusually simple. You describe the outcome you want in plain words, an expert-built agent runs it and hands back the finished result in about 60 seconds, and you pay per use: one dollar equals 1,000 credits, and you only pay when the agent runs. There is no idle infrastructure bill humming in the background, so your forecast reduces to two numbers, credits per run and expected runs.

That structure removes most of the modeling burden. Because Gravity runs and maintains the agent and carries the cost of the underlying models and tools, you do not have to forecast token rates, retry overhead, or context growth yourself; that complexity is on the platform's side, expressed to you as a credit cost per run. Your forecast is credits per run times the volume you expect over the period, run through the same best, expected, and worst cases. Pay-per-use also means the worst-case risk is bounded by what you choose to run rather than a fixed commitment you owe regardless of usage.

New to the platform? Setting up your first AI agent walks through going from a plain-language description to a running workflow, and the glossary and what is an AI agent explain why agents are billed by what they do rather than by the hour. Forecast at the unit level, scale by volume, set guardrails, and the spend stops being a mystery before you ever commit to scaling.

FAQ

How do you forecast AI agent costs?

Build a unit-economics model first: estimate the cost of one task as model tokens plus tool calls plus retries. Then multiply that cost per task by your expected run volume for the period. Run best, expected, and worst cases, test sensitivity on the biggest driver, and set a budget and alerts as guardrails so reality cannot quietly outrun the plan.

What drives the cost of an AI agent?

Cost per run comes from model token usage, the number of tool calls, and retries, multiplied by how many runs you do. Model choice and context size move token cost the most: a larger model or a bigger context per run raises the per-task figure. Volume then scales whatever that per-task number is, which is why forecasting starts at the unit level.

How is cost forecasting different from cost control?

Forecasting is forward-looking: you model what spend will be before you scale, so you can plan and budget. Cost control is the operational discipline of keeping live spend within those bounds. Anomaly detection is reactive, catching spikes after they start. This post is the forecast; the model you build here feeds the control and detection work that follows.

Why run best, expected, and worst-case scenarios?

A single point estimate hides risk. Best, expected, and worst cases bracket the range, so you size budgets to the worst plausible outcome rather than the hopeful one. The spread also shows which input you are most exposed to, usually volume or retries, which tells you where a small change in reality moves the total the most.

How does Gravity pricing map to a cost forecast?

Gravity is pay per use: one dollar equals 1,000 credits, and you pay only when an agent runs. That turns a forecast into simple arithmetic, namely credits per run times expected runs. There is no idle infrastructure cost, so the variable in your model is run volume rather than a fixed monthly platform bill you pay regardless of usage.