AI Agent Deployment Models Explained: Cloud vs Self-Host vs Hybrid

The deployment-model question shows up earlier than buyers expect. The first time someone asks "where does the agent actually run?" is usually thirty seconds into a security review, and the answer determines half the contract. Cloud, self-host, hybrid: three models, three sets of trade-offs, three buyer profiles. This piece walks through each, the operational cost behind each, and which one fits which buyer.

Public-cloud spend on AI infrastructure crossed a meaningful threshold in 2025: Bessemer's State of the Cloud report puts cloud AI workloads at the fastest-growing line item in enterprise budgets, with hybrid deployment patterns accounting for the majority of regulated-industry adoption (Bessemer Venture Partners, State of the Cloud 2025). The buyer is not asking "cloud or not". The buyer is asking "which split, and what are the operational costs of each".

The three models, mapped

An AI agent is not one component; it is a stack. The model call (the LLM), the orchestration layer (the agent loop), the tool layer (API integrations), the memory layer (vector store, key-value store), and the observability layer (logs, audit trails). The deployment model decides which of these components runs in whose environment.

Hybrid is the modal compliance pattern: vendor handles model and orchestration; buyer keeps tools, memory, and audit logs in-perimeter.

Cloud-hosted: fastest start, least control

In a fully cloud-hosted deployment, every component lives in the vendor's environment. The buyer signs up, provides API credentials for whatever tools the agent needs to access, and the agent runs. Onboarding takes minutes, upgrades happen invisibly, and the buyer sees a UI and a billing line.

The trade-off is data exposure. The buyer's documents, prompts, tool-call payloads, and outputs all transit and often persist in vendor-controlled infrastructure. For most B2B SaaS buyers below the regulated-industry threshold, that is acceptable; the same trade-off applies to every other SaaS tool they use. For healthcare, government, and financial-services buyers, it is usually not.

Cloud-hosted is also the cheapest model for the vendor to operate, and that economics flows back to the buyer as lower per-task pricing. The unit economics of running agents at scale are easier to make work when the vendor controls the entire stack, as covered in economics of bootstrapped AI agents.

Self-hosted: maximum control, real operational weight

Self-hosting flips the model. The vendor ships software (often a container or a cluster of services) and the buyer runs it on infrastructure they control. Every component, including the model, runs in the buyer's perimeter. No production data crosses the vendor boundary except telemetry the buyer explicitly opts into.

The price is operational. Running model inference at production reliability is non-trivial: GPU provisioning, batching, autoscaling under bursty load, model-version pinning, vector-store maintenance, queue infrastructure. Most buyers underestimate this cost by an order of magnitude. The break-even point against a managed alternative usually requires both significant scale and a reason that scale alone does not justify, such as model-weight ownership for IP reasons.

Self-hosting has a second cost that shows up later: upgrade lag. The vendor's hosted version improves continuously; the self-hosted version moves on the buyer's release schedule, which is usually quarterly. By month nine, the self-hosted instance is materially behind the cloud instance. The buyer has to budget for keeping up.

When self-hosting is the right call

Three conditions together make self-host worth the operational cost. First, regulatory or contractual constraints prohibit data leaving the buyer's environment in any form, including encrypted vendor-side logs. Second, the buyer has a real platform team capable of operating model inference and vector stores at production reliability. Third, the buyer values model-weight ownership or version control more than upgrade velocity. If any of those three is missing, hybrid is usually the better answer.

Hybrid: the modal compliance pattern

Hybrid splits the stack along the data boundary. The control plane (orchestration, policy, observability tooling, vendor-managed components that do not touch customer data) runs in the vendor cloud. The data plane (tool calls, document access, vector store, audit logs) runs inside the buyer's environment, often in the buyer's own VPC or data center.

The architecture works because most of the operational complexity, the parts that actually benefit from vendor scale, can be separated from the parts that touch sensitive data. The vendor runs the agent loop, ships upgrades, and handles incident response on its own components. The buyer's data never crosses the perimeter except as outbound API calls the buyer is already making to the agent's tool integrations.

Hybrid is what most regulated-industry buyers actually pick when they finish the procurement process. It is more complex than cloud, less complex than self-host, and the compliance officer signs off because production data does not leave the buyer's environment. The ongoing cost is the integration boundary: every release cycle the vendor and the buyer have to coordinate, and incidents require a clear contract about whose component is responsible.

How to choose: a decision framework

The decision is rarely about technology. It is about which constraint is binding. Three questions, in order:

Does production data have to stay inside the buyer's perimeter? If yes, eliminate cloud-hosted.
Does the buyer have a platform team capable of operating model inference? If no, eliminate self-host.
Is the buyer willing to coordinate releases with the vendor? If no, the choice is back to either cloud or self-host depending on the answer to question one.

For most buyers, the answers are: yes (data must stay), no (no platform team), yes (will coordinate). That points to hybrid. The minority of buyers with a serious platform team and strict sovereignty requirements end up at self-host. The minority of buyers in unregulated B2B verticals stay on cloud.

The deployment model also interacts with the rest of the agent stack. Single-agent vs multi-agent (covered in single-agent vs multi-agent) is more sensitive to deployment model than buyers expect: multi-agent coordination across a hybrid boundary adds latency that single-agent does not pay. The cost model the vendor uses (covered in cost models explained) often differs across deployment models, with self-host typically priced as a flat license and cloud priced per-task.

Where Gravity sits

Gravity is cloud-hosted at launch. The buyer profile, individual operators and small teams who want an autonomous agent running in 60 seconds, does not have a platform team and does not have data-residency requirements that would force hybrid. The describe-an-outcome interface (describe outcome, not workflow) only works when the agent has fast access to the model and the buyer has zero infrastructure burden. Cloud is the right model for that buyer.

Hybrid is on the roadmap for the regulated-industry tier. The same agent loop, the same reliability methodology described in how we test AI agents, but with the data plane running inside the buyer's environment. Self-host is not on the roadmap; the operational cost would dominate the buyer experience and undermine the 60-second deploy promise. Founders learn quickly that staying out of categories you cannot operate well is part of the framework, as covered in three startups, three shutdowns.

The question of "where does the agent run" is the first hard question of every enterprise procurement cycle. The answer determines pricing, latency, audit, and which features ship to which tier. Pick the model that matches the buyer; do not try to be all three.

Frequently asked questions

What are the main AI agent deployment models?

Three primary models: cloud-hosted (vendor runs the agent), self-hosted (you run it on your own infrastructure), and hybrid (control plane in the vendor cloud, data plane in your environment). Each trades convenience for control. Cloud is fastest to start. Self-host is heaviest to operate. Hybrid is the compromise most regulated buyers actually pick.

Is self-hosting an AI agent worth the operational cost?

Self-hosting is worth it when data residency, regulatory requirements, or model-weight ownership matters more than time-to-first-value. For most early-stage buyers, the operational cost of running model inference, vector stores, queue infrastructure, and agent orchestration internally exceeds the value of the control. Self-host when you have a real reason.

What is hybrid AI agent deployment?

Hybrid splits the agent stack: the control plane (orchestration, policy, observability) runs in the vendor cloud, while the data plane (the agent's tool calls, document access, customer data) runs inside the buyer's environment. Sensitive data never leaves the buyer's perimeter. The vendor handles upgrades, the buyer keeps audit control.

How does deployment model affect AI agent latency?

Cloud-hosted agents have predictable latency for the model call but add network hops to internal tools. Self-hosted agents have low internal-tool latency but inherit your infrastructure's variability. Hybrid agents pay the control-plane round trip per task. For interactive use, the difference is rarely the bottleneck. For batch automation, it can be.

Which AI agent deployment model is best for compliance?

Hybrid is the compliance default for healthcare, financial services, and government buyers who need vendor-supported software but cannot let production data leave their environment. Pure self-host wins when audit logs must be entirely customer-owned. Pure cloud wins when the buyer's compliance regime treats the vendor as a sub-processor and the contract handles the rest.

Three takeaways before you close this tab

Cloud, self-host, hybrid map to time-to-value, sovereignty, and compliance respectively. Pick on which constraint is binding.
Hybrid is the modal regulated-industry choice. Vendor runs the loop, buyer keeps the data plane.
Self-host has a real operational cost. Most buyers underestimate it by an order of magnitude.

Sources

Bessemer Venture Partners, "State of the Cloud 2025", retrieved 2026-05-07, bvp.com/atlas/state-of-the-cloud-2025
NIST, "AI Risk Management Framework", retrieved 2026-05-07, nist.gov/itl/ai-risk-management-framework
OWASP, "Top 10 for LLM Applications", retrieved 2026-05-07, owasp.org
Aryan Agarwal, "Gravity deployment-model decision spec", internal v1, May 2026, About