ROI calculations for AI agents fall into two categories: defensible numbers backed by measurement, and made-up numbers backed by vendor claims. The CFO can tell the difference. This guide is the defensible version: the formula, the inputs, the baseline measurement, the scenario analysis, and how to present it. Companion to TCO, cost vs ROI, and executive business case.

The ROI formula

The standard ROI formula adapted for agents.

ROI (%) = ((Annual Value − Annual Cost) / Annual Cost) × 100
Payback (months) = Annual Cost / (Annual Value / 12)

Both numbers go in the business case. ROI is the headline; payback is the gut-check on how quickly the bet pays for itself.

For longer horizons, layer in NPV with a discount rate. The CFO will have a corporate hurdle rate (typically 8 to 15 percent for software investments) you can use. A 3-year NPV-positive case at the hurdle rate is the standard bar.

Value inputs

Three categories of value. Measure each independently; double-counting is the most common ROI inflation.

Time saved. Hours per task × tasks per period × loaded labor rate. The loaded rate includes salary, benefits, employer taxes, and overhead. McKinsey's labor productivity work uses a loaded cost roughly 1.3 to 1.5 times base salary for knowledge workers (McKinsey Future of Work, 2024). Subtract the residual human time still required for review or exception handling after the agent runs.

Error reduction. Baseline error rate × baseline volume × cost per error, minus agent error rate × agent volume × cost per error. Cost per error includes rework, customer remediation, and downstream impact. Many ROI calculators stop at rework time; the downstream impact is often where the real money is.

Revenue lift. Incremental revenue attributable to the agent. Examples: deals progressed faster (close-rate × deal-size × incremental velocity), churn avoided (at-risk accounts × churn rate × ARR), lead conversion improvement. Attribution is hard; conservative is better than aspirational.

Cost inputs

Five cost components. Most calculators count only the first two.

  1. Platform license. The fee paid to the agent platform vendor.
  2. Model usage. Token costs if not bundled in the platform fee. Watch the actual TPM consumed at your volume, not the list rate.
  3. Integration build. One-time engineering cost to connect the agent to your systems. Usually 2 to 8 weeks of engineering time per integration.
  4. Maintenance. Ongoing engineering for prompt tuning, eval, integration drift, model upgrades. Budget 10 to 20 percent of integration build per year.
  5. Change management. Training, process redesign, handling resistance. Often the largest hidden cost in regulated or unionized environments.

Measuring the baseline

The baseline is the most important input and the one most often guessed. Three measurements.

Time per task. Have 3 to 5 people perform the target task; time them with a stopwatch; take the median. Repeat across 10 to 30 task instances for variance. Average is the baseline; standard deviation is the uncertainty.

Frequency. Pull actual volume from your systems for the last 3 months. Do not estimate. The estimated number is usually 30 to 50 percent off the actual.

Error rate. Sample 50 to 100 historical task instances; classify each as correct, error, or ambiguous. The error rate is errors / total. Categorize errors by cost class because "small error" and "policy violation" have very different downstream costs.

Scenario analysis

Three scenarios for every ROI calculation.

Forrester's Total Economic Impact methodology, which most enterprise software business cases use as a reference, formalizes this scenario approach with explicit risk-adjustment of inputs (Forrester TEI, 2024). The low case is the one to present alongside the expected; it is the defensible floor.

A worked example

Use case: agent that triages customer support tickets and drafts a response.

Baseline.

Agent expected scenario.

Annual value calculation.

Annual cost.

Result. Year 1 ROI = ($693,600 − $148,333) / $148,333 × 100 = 368 percent. Payback = $148,333 / ($693,600 / 12) = 2.6 months.

The low scenario (50% autonomous, 30% residual time) drops ROI to roughly 180 percent and payback to 5 months. Still positive; the case is defensible.

Defending the number

What the CFO will ask.

Common mistakes

Counting full task time as saved. Forgetting the review and escalation residual. Typical inflation 20 to 40 percent.

Loaded rate too low. Using base salary instead of fully loaded cost. Typical inflation 30 to 50 percent in the wrong direction (which is fine for the conservative case but should be acknowledged).

Ignoring change management. Easily 10 to 20 percent of the total Year 1 cost. Skipping it produces an ROI number that does not survive contact with rollout.

Stacking value categories that overlap. Counting "time saved" and "revenue lift from faster handling" both at full value when one drives the other. Pick one or split.

Optimistic baseline. The baseline error rate or time-per-task is set at a number that flatters the agent. Real measurement, not vibes.

FAQ

How do you calculate ROI for an AI agent?
Annualized value minus annualized cost, divided by annualized cost. Value comes from time saved, error reduction, and revenue lift. Cost includes platform, model usage, integration, maintenance, and change management.
What time horizon should I use?
Twelve months for the business case headline. Show payback within 12 months and NPV over 36 months for the full life.
What is a reasonable ROI to target?
300 percent first-year ROI is achievable for well-scoped use cases. Below 100 percent suggests misfit; above 1000 percent suggests an optimistic input.
How do I quantify time savings honestly?
Measure baseline with a stopwatch on 10 to 30 runs. Net out residual review and exception time when an agent does the task.
Should I include opportunity cost?
Yes when reallocating freed time produces measurable value. Skip it when the freed time has no measurable downstream use.
What is the typical payback period for AI agents?
Three to nine months for well-scoped use cases. Shorter suggests cost is undercounted; longer suggests scope or fit is wrong.

Sources