What this agent does

An expense-categorisation agent does the same job a bookkeeper does in QuickBooks or Xero, but for every transaction the company posts, not just the ones that get to month-end. It reads the corporate card feed, reads the OCR'd receipts attached to expense reports, and assigns each transaction to a category in the chart of accounts.

It does not write to the general ledger on day one. It posts to a staging table that the bookkeeper reviews. Once accuracy is calibrated, transactions matching a high-confidence rule auto-post with an audit-trail entry, and the rest still go to the reviewer queue. This is the same pattern used by Brex, Ramp, and the Xero machine-learning categoriser, but tuned for a specific company's chart of accounts.

For a related general pattern, see AI agent for inbox triage. For the cluster context, see what an AI agent can actually do.

Sources of expense data

Three streams feed the agent.

Corporate card transactions. Pulled via the issuer's API (Brex, Ramp, Mercury, Stripe Issuing). Each transaction carries merchant name, MCC (Merchant Category Code, the four-digit ISO 18245 code the card network assigns), amount, currency, and posting date. The MCC is the single highest-signal feature for first-pass classification.

Receipt images. Photographed or emailed receipts, OCR'd via the OCR service the platform integrates (Google Cloud Document AI's expense parser, Amazon Textract's AnalyzeExpense, or the receipt-OCR mode in the agent's underlying model). The OCR output gives line items, taxes, and merchant address.

Submitted expense reports. A user submits a report with a category they have picked from a UI dropdown. This category is a label, not a fact. The agent uses it as a hint, not as ground truth, because users habitually pick "office supplies" for anything they cannot otherwise classify.

The agent reconciles the three streams. A submitted report should match a card transaction within a 7-day window if it is a card-based expense. If it does not, the agent flags the report and asks for the supporting card transaction or marks it as reimbursable cash.

Fixed chart of accounts

The chart of accounts is the closed label set the classifier uses. It is the same chart the company's accountant uses to file taxes and produce financial statements.

For a US-incorporated SaaS company filing under US GAAP, the chart typically contains 40 to 80 leaf accounts spread across five top-level groups: Assets, Liabilities, Equity, Revenue, and Expenses. Expense categorisation only touches the Expenses subtree, but it has to be exact: classifying a software subscription as "Office Supplies" instead of "Software & SaaS" makes the SaaS line on the income statement wrong.

For an Indian private limited company filing under Ind-AS, the chart aligns to Schedule III of the Companies Act and is materially different from US GAAP. The agent has to be configured per jurisdiction, not generically.

Free-text categorisation (letting the user or the classifier invent a category) breaks at month-end. A bookkeeper closing the books cannot reconcile "office misc," "office supplies," and "office stationery" without manual merge. Use the chart that exists. Do not invent a parallel taxonomy.

How the classifier works

The classifier is a chain of three steps. The first two are deterministic. The third is the language-model fallback.

Step 1: MCC rule lookup. A merchant-category-code-to-account map handles the long tail of unambiguous merchants. MCC 5812 (eating places and restaurants) maps to "Meals & Entertainment." MCC 5732 (electronics stores) maps to "Office Equipment" or "Computer Hardware" depending on the chart. Roughly 60 to 70% of corporate card transactions land here in our testing.

Step 2: Merchant memory. A per-organisation cache of merchant string to account, learned from confirmed categorisations. The first time someone marks a "Datadog" charge as "Software & SaaS," the agent remembers it. The next "Datadog" charge gets the same category without asking. This step adds another 15 to 20%.

Step 3: Language-model classifier. For the remainder, the agent calls a constrained-output language model with the chart of accounts as the allowed label set, the merchant, MCC, OCR line items, and the user's hint. The output is a category plus a confidence score. Anything below the configured threshold (we default to 0.85) gets flagged to the bookkeeper.

Constrained output here is critical. Letting the model emit free-text categories means the classifier sometimes returns "Misc Office," which does not exist in the chart. JSON-schema-constrained generation (or, equivalently, function-call mode) forces the output to be one of the defined accounts.

Duplicate and fraud checks

Two failure modes account for most expense-system pain: duplicate submissions and policy violations the bookkeeper has to chase down later. The agent catches both.

Duplicate fingerprint. A composite key over merchant, absolute amount, posting date, and last four of the card. Two submissions matching across all four within a rolling 30-day window are flagged. The honest case is a card transaction the user also submitted as a reimbursable. The dishonest case is a duplicate reimbursement claim. Both need a human glance before posting.

Policy fingerprint. The company's expense policy is a list of rules: per-meal cap, no alcohol, no first-class, no personal items. The agent encodes the rules as line-level checks against OCR output. A meal receipt with one $90 entree and a bottle of wine over a $250 cap gets flagged with the specific line that broke policy. The user can dispute, the bookkeeper can override, and the audit trail records who did what.

Audit trail matters. SOC 2 Type II auditors, our own among them, ask for the per-transaction trail of who categorised, who approved, and who posted. The agent records all three plus the classifier confidence and the rule fired.

Guardrails

Five guardrails keep the agent out of trouble.

For the broader principle, see AI agent safety and guardrails.

Common mistakes

Auto-posting from day one. The temptation is to write straight to the general ledger because the classifier "feels accurate." It is not on day one. Run in staging-only mode for 30 days. Compare against the bookkeeper's manual categorisations on the same data.

Treating MCC as ground truth. Merchant Category Codes are assigned by the card network and are wrong often enough to notice. A coworking space billed under a real-estate MCC, a consulting fee billed under a generic "professional services" MCC. The MCC is a hint, not a label.

Ignoring multi-currency. A USD-denominated company that pays a EUR invoice with a corporate card sees both currencies on the same statement. The agent has to record both and pick the FX rate from a defined source. Inventing a rate at classification time creates a reconciliation mismatch at month-end.

Free-text categories sneaking in. If the dropdown the agent or the user picks from is editable, free-text categories will appear. Lock the dropdown to the chart. Make new categories a request that goes to the bookkeeper, not a self-service action.

Skipping the audit trail. The agent's value to a finance team is not just the time saved; it is the per-transaction record of who did what. Record classifier confidence, rule fired, reviewer, and timestamp. Without these, the agent is a black box at audit time.

Frequently asked questions

What does an expense-categorisation agent actually do?

It ingests three sources, corporate card transactions, OCR'd receipts, and submitted expense reports, then maps each line to a category in the company's chart of accounts. Low-confidence matches are flagged, not guessed. Duplicates and out-of-policy items are routed to the finance reviewer. It does not auto-post journal entries to the general ledger in the first 30 days.

Why use a fixed chart of accounts instead of free-text categories?

Free-text categories drift across months and across employees, which makes reconciliation slow and tax filings error-prone. A fixed chart of accounts, derived from the accounting standard the company files under (US GAAP, IFRS, or Indian Ind-AS), gives the classifier a closed label set and the bookkeeper a single source of truth at month-end.

How accurate does the classifier need to be before the agent posts entries?

Agreement with the bookkeeper has to be above 95% across 200 recategorised transactions before the agent moves from suggest-only to auto-post with reviewer approval. Auto-post without approval is not appropriate at any accuracy below 99%, since a single mis-classified transaction can cascade into wrong sales tax recovery.

Does the agent handle multi-currency and foreign-exchange?

Yes, but it does not invent FX rates. It pulls rates from the source-of-truth the company already uses (the card issuer's posted rate, Xero's daily rate, or a configured FX feed) and records both the original currency and the converted amount on the journal line. Where the rate is ambiguous, the transaction is flagged.

How does the agent prevent duplicate expense submissions?

It computes a fingerprint over merchant, amount, date, and last four of the card. If two submissions match across all four fields within a 30-day window, the second is automatically held and the submitter is asked to confirm. Genuine duplicates (corporate card swipe plus an employee-submitted reimbursement for the same meal) are caught here.

Three takeaways before you close this tab

Sources

The same shape, applied to other tools and surfaces: