Most AI agent purchases go wrong at the sales-call stage, not after deployment. The team likes the demo, the vendor likes the deal, and a year later someone is paying for an unused seat tier with a 60-day notice clause and no clean way to export memory. The checklist below is built to surface those traps before signature, not after. Companion to the AI agent vendor evaluation framework; this one is the procurement-grade question list.

Fifty questions across eight categories. Use it as an RFP appendix, a sales-call worksheet, or a security-review form. Two rules: write the vendor's answers down (verbal commitments do not survive a sales-team turnover), and require contract language for anything that costs you money if it goes wrong.

How to frame an AI agent purchase

An AI agent platform is not a SaaS app with extra buttons. It is a system that touches data, makes calls on your behalf, holds memory, and can spend money in tools it controls. That changes the procurement surface in three specific ways. Data flow is bidirectional and continuous, so the data processing addendum matters more than the master agreement boilerplate. Liability is split across model provider, agent platform, and tool vendors, so indemnity language has to name them. And cost scales with traffic, so the billing meter you accept on day one becomes a finance KPI by day 90.

Bring three people to the call: a security lead who reads SOC 2 reports, a finance lead who can model usage scenarios, and a product lead who knows what the agent will actually be asked to do. Without the product lead, the security and finance answers float in the abstract; without the security lead, the demo becomes the whole evaluation.

Security and compliance (12 questions)

  1. What certifications do you hold, and can we see the latest SOC 2 Type II report and ISO 27001 statement of applicability?
  2. Are you aligned to the NIST AI Risk Management Framework (AI RMF 1.0) and which functions of the framework do you map your controls to?
  3. Do you maintain a public security page with disclosed incidents and a coordinated vulnerability disclosure policy?
  4. How do you defend against the OWASP LLM Top 10 risks, especially prompt injection, sensitive information disclosure, and excessive agency?
  5. What is your policy on third-party penetration testing, and will you share a summary report under NDA?
  6. Where are encryption keys stored, who can access them, and do you support customer-managed keys?
  7. How is multi-factor authentication enforced for our admin users, and is single sign-on (SAML or OIDC) included in our tier?
  8. Do you support role-based access control with custom roles, or only fixed admin and member roles?
  9. What audit logging do you provide, and can we stream events to our SIEM via webhook or syslog?
  10. How are sub-processors disclosed, and what notice do you give before adding a new one?
  11. How do you isolate one customer's data from another in your multi-tenant deployment?
  12. Do you offer a single-tenant or VPC deployment option for regulated data?

The SOC 2 Type II report is the single most useful artifact in the security review. Read the exceptions list before reading the summary; vendors disclose what they failed and how they remediated, and the pattern of exceptions tells you more about culture than the executive summary does (AICPA, 2025).

Data handling and privacy (8 questions)

  1. Will you train models on our data by default? If yes, can we opt out at the account level, and is that opt-out in the contract?
  2. What is your data retention policy by data type, and what is the maximum retention we can configure?
  3. Where is data stored geographically, and can we pin storage to a specific region (EU, US, India, others)?
  4. Do you sign a GDPR data processing agreement, and which standard contractual clauses module applies to cross-border transfers?
  5. What is your data deletion timeline on termination, and is deletion verified by a written attestation?
  6. Do you have HIPAA business associate agreements available, and what services are included in HIPAA scope?
  7. How do you handle data subject access, rectification, and deletion requests under GDPR, CCPA, and emerging regulations?
  8. What is your incident notification timeline for a data breach involving our records?

The default training-on-customer-data setting is the question that catches teams off guard. Major vendors now offer enterprise tiers that exclude business data from training by default; consumer tiers often do not (OpenAI Enterprise privacy, 2025; Anthropic Commercial Terms, 2025). Confirm in writing which tier you are on and what the default is.

Reliability and SLA (7 questions)

  1. What is your monthly uptime guarantee, measured how, and over what window?
  2. What service credits apply when the SLA is missed, and is the credit automatic or claim-based?
  3. What is your incident-response time-to-acknowledge and time-to-restore commitment?
  4. Do you provide a public status page, and how granular are the components (API, dashboard, specific regions)?
  5. What is your tested recovery time objective and recovery point objective for a regional outage?
  6. How do you handle model provider outages: failover to a secondary model, queue, or fail closed?
  7. What capacity headroom do you maintain, and can you commit to dedicated capacity for our workload?

The SLA gap most teams miss is the difference between platform uptime and agent-run success. The dashboard can be up while the underlying model API is rate-limited or down, and your runs are failing anyway. Ask explicitly: is the SLA measured on platform availability, on agent-run success, or both? The honest answer is usually "platform"; the right contractual fix is a separate run-success commitment with a credit. See agent uptime and reliability for the metric definitions.

Pricing and billing (8 questions)

  1. Which meters do you bill on: seats, tokens, runs, executions, tool calls, storage, egress, or a combination?
  2. Can you show me a worked example for a workload like ours, with the assumptions written down?
  3. Is the price discount tier locked, or does it require usage minimums that, if missed, push us back to retail rates?
  4. What is the overage rate above contracted volume, and is there a soft cap or hard cap?
  5. How are model upgrades priced: are we automatically opted in to a more expensive newer model, or is the swap controlled by us?
  6. Can we configure budget alerts at 50, 75, 90 percent of contracted spend, and a hard stop at 100 percent?
  7. Are there one-time setup, professional services, or onboarding fees, and what is in scope for them?
  8. What is the term length, the auto-renewal clause, and the notice period to opt out of renewal?

For pricing, the trap is comparing list prices across vendors that meter differently. Convert everything to your cost-per-completed-task on a representative workload. The companion piece on AI agent cost models walks through the conversion. The companion cost control tactics piece covers what to do after you sign.

Integrations and extensibility (6 questions)

  1. What is your native integration count, and is there a maintained registry with last-tested dates?
  2. How do custom integrations work: HTTP webhooks, MCP servers, custom code, or a no-code builder?
  3. Is there an SDK in our preferred language (Python, TypeScript, others), and what is its versioning and deprecation policy?
  4. Can we self-host an integration that talks to internal-only services, and how does that look from the platform side?
  5. What is the rate-limiting model for outbound tool calls, and can we configure per-tenant or per-tool ceilings?
  6. What identity providers are supported for outbound auth: OAuth, API key, service accounts, signed requests?

Two integration questions matter beyond "do you have my tool". First, who maintains the integration, the vendor or a community contributor; community integrations break when the underlying API changes and nobody is paid to fix them. Second, can you run a private integration without publishing it to a public catalog. For internal systems, the answer must be yes.

Support, success, and roadmap (5 questions)

  1. What support tier comes with our contract, what are the response SLAs, and what channels (email, Slack, phone) are included?
  2. Is there a named customer success manager, and at what contract size does that activate?
  3. What does the roadmap look like for the next two quarters, and what is the process for influencing it?
  4. Do you offer a sandbox or dedicated staging environment, and is it included in our tier?
  5. What training and enablement materials do you provide for new admin and end-user cohorts?

Roadmap influence is the underrated leverage point. A large customer's pain point will move a vendor's quarterly priorities if surfaced through the right channel; "the right channel" is usually the named CSM plus a quarterly business review. Ask for both at signature, not after the first quarter of churn warning emails.

Exit and portability (4 questions)

  1. What data export tools and formats do you provide, and can we self-serve an export at any time?
  2. On termination, what is the data deletion timeline, and will you provide a written deletion attestation?
  3. What happens to in-flight runs, scheduled jobs, and agent memory on termination, and how is access wound down?
  4. What is the licensing status of agents we build on your platform: do we own the agent definitions, or is there a platform-bound license?

Exit is the cheapest insurance and the most-forgotten contract section. Negotiate it before signature; once you are unhappy with the vendor, leverage is gone. The standard ask is a 30 to 90 day deletion timeline, a written attestation, self-serve export of agent definitions and run history, and ownership of agent IP that you authored on the platform. Vendors that resist this are telling you something useful.

FAQ

What is the single most important question to ask an AI agent vendor?
How will you treat my data: is it used for training, who can see it, where is it stored, and how is it deleted on exit. Every other risk follows from the answer to that one.
What SLA should I expect from an AI agent platform?
Production-grade vendors publish a measurable monthly uptime target with service credits, usually 99.5 to 99.9 percent, plus a defined incident-response time. Anything less precise is marketing.
How do I price-compare AI agent platforms?
Normalize to cost-per-completed-task on your real workload. Per-seat, per-run, and per-token meters all look cheap on a slide; only your traffic shape tells you which one ends up cheapest at month-end.
What security certifications matter for an AI agent vendor?
SOC 2 Type II is table stakes for B2B. ISO 27001 is helpful for international procurement. HIPAA, PCI DSS, and GDPR DPA apply by data type. NIST AI RMF alignment is a signal of process maturity.
What exit clause should I insist on?
A written data-deletion timeline (typically 30 to 90 days), an export-on-request right, and a clear definition of what happens to in-flight runs and stored memory at termination.
Should I require an on-premise or VPC deployment option?
Required for regulated data (healthcare, defense, finance with internal-only rules). For most B2B teams, a multi-tenant SaaS with strong tenant isolation is cheaper, faster, and equally safe.

Sources