What is an AI agent handoff?

The contract by which one agent passes work to another agent (or a human). It specifies what gets passed (state, context, intent, return path), when, and what happens if the receiver is unavailable. Frameworks like LangGraph, OpenAI Agents SDK, and CrewAI implement subsets of the eight common patterns.

How do I choose between Sequential and Router patterns?

Sequential fits when steps are known in advance and order matters. Router fits when one of several specialists should handle a request and the choice depends on the request itself. Sequential is a pipeline; Router is a switchboard.

When should an AI agent escalate to a human?

On low confidence, high blast radius, ambiguity beyond the agent's training distribution, or a policy-defined gate (refunds above a threshold, irreversible actions). The escalation payload should include the agent's reasoning trace, the proposed action, and a single clear question.

What is a Swarm handoff?

Peer-to-peer agents sharing a common workspace and selecting work autonomously. No central orchestrator. Useful when tasks are independent and parallelism beats coordination overhead. Risk: cross-agent conflicts if state hygiene is poor.

What goes in the handoff payload?

Four fields. State: the data the next agent needs. Context: the conversational history or compressed summary. Intent: what to accomplish, stated as an outcome not a workflow. Return path: how to resume if the receiver completes, errors, or escalates.

AI Agent Handoff Patterns: 8 Contracts That Don't Drop Work

A handoff is the contract between two agents (or one agent and a human) that specifies what gets passed, when, and what happens if the receiver is unavailable. Eight patterns cover most production cases. LangGraph, OpenAI Agents SDK, CrewAI, and Anthropic's subagent model all implement subsets of these patterns; the patterns themselves are framework-independent (LangGraph multi-agent, 2025; OpenAI Agents SDK, 2025).

The hard part is rarely the pass itself. It is the intent preservation and the failure mode. The patterns below give you the names, the payload contracts, the failure modes, and a verification test for each one.

The handoff contract: what every pattern shares

Four fields. State: the data the next agent needs to act, structured and typed. Context: the conversational history, ideally compressed to a relevance summary rather than full transcript. Intent: what to accomplish, stated as an outcome not a workflow ("refund the customer", not "open Stripe, find the charge, click refund"). Return path: how to resume the caller if the receiver completes, errors, or escalates further. Without all four, a multi-agent system silently drops work or duplicates it.

Pattern 1: Sequential handoff (the pipeline)

Intent. Steps run in a fixed order; each agent's output is the next agent's input. Payload. State plus context plus intent for the next step. Failure mode. If a downstream agent is unavailable, queue the work and notify the previous step. Verification. Run the pipeline with one downstream offline; the work should queue, not vanish.

Best for known multi-step workflows: extract, then validate, then summarize, then send. Worst for branching logic; that is the Router's job.

Pattern 2: Router handoff (the dispatcher)

Intent. A small classifier agent picks the right specialist based on the request. Payload. State plus intent; specialists pull context themselves. Failure mode. If no specialist matches, fall to a default handler or escalate. Verification. Feed inputs designed to be ambiguous and confirm the fallback fires, not silent guessing.

Pattern 3: Hierarchical Supervisor (manager and reports)

Intent. A supervisor agent decomposes a task, dispatches subtasks to specialists, aggregates results. LangGraph's hierarchical multi-agent docs codify this pattern (LangGraph, 2025). Payload. Full subtask brief per specialist; specialists return structured results to the supervisor. Failure mode. Specialist down → supervisor re-routes or escalates. Verification. Kill a specialist mid-run; supervisor should detect and re-plan.

Pattern 4: Swarm (peer-to-peer with shared state)

Intent. Peer agents share a workspace and pick up work autonomously. No central orchestrator. Payload. Workspace state, claim-key per work item to prevent collisions. Failure mode. An agent crashing mid-task should release its claim within a TTL. Verification. Crash one peer; the claim should auto-release and another peer should pick it up.

Pattern 5: Blackboard (shared workspace)

Intent. Multiple agents post observations and partial results to a common board; a coordinator (or another agent) reads and acts. Classical AI pattern from Hayes-Roth, reborn for LLM agents. Payload. Structured updates with timestamps and authorship. Failure mode. Stale or conflicting board entries should be timed out. Verification. Insert a stale entry; coordinator should ignore it past TTL.

Pattern 6: Sidecar Critic (parallel evaluator)

Intent. A critic agent runs in parallel with the primary, evaluates the primary's proposed action, can veto or annotate. Common for safety-critical paths. Payload. Proposed action plus reasoning trace. Failure mode. Critic down should default to denying high-risk actions, allowing low-risk ones. Verification. Disable critic; high-risk actions should be blocked, not allowed by default.

Pattern 7: Human Escalation (the gate)

Intent. Agent reaches a gate, pauses, asks a human, resumes on response. Payload. Reasoning trace so far, the proposed action, a single clear question, deadline by which a default fires if no human responds. Failure mode. No human response by deadline → either the safe-default action fires or the work is queued indefinitely; choose per policy. Verification. Run an escalation against an empty inbox; the deadline behavior should match policy.

Three rules for escalation payloads. The question must be answerable with a single decision. The reasoning trace must be skimmable in under 30 seconds. The deadline must be explicit, not implicit.

Pattern 8: Deferred Resume (asynchronous handoff)

Intent. Agent A completes, fires an event, agent B (or A itself, later) picks up and continues hours or days later. Payload. Durable state, event ID, resume hint. Failure mode. Event lost or duplicated; idempotency keys plus a dead-letter queue. Verification. Duplicate the event; the second consumer should detect and no-op.

Designing the payload itself

The payload is the contract. Five rules that hold across patterns. One: type the payload. JSON schema or a typed struct beats a free-form dictionary. A typed payload makes silent drift loud; a free-form one makes silent drift silent. Two: pass intent as outcome, not workflow. "Refund the customer for order X" survives schema changes; "click the refund button in Stripe for order X" does not. Three: include a return path on every handoff. Even pipelines that look one-way have failure cases that need to bubble back; treating return as optional turns every failure into a debugging archaeology dig.

Four: compress context, do not transcribe it. The receiver does not need the full conversation; it needs the decisions, the constraints, and the open questions. A compressed summary of 200 tokens beats a transcript of 5,000 every time on quality and on cost. Five: include a deadline. Every handoff implicitly has one; making it explicit lets the receiver plan its own timeouts and lets a downstream supervisor detect stuck work without polling.

Failure modes per pattern

Each pattern has a signature failure mode worth knowing in advance. Sequential's is the silent stall: a downstream agent down means the work queues; if no one is watching the queue, work piles up invisibly. Router's is the misroute: a specialist gets work it cannot do and silently degrades quality. Supervisor's is the cascade: the supervisor itself fails and takes the whole hierarchy with it. Swarm's is the conflict: two peers claim the same work and produce divergent outputs.

Blackboard's is the stale entry: an old observation poisons new reasoning. Sidecar Critic's is the false veto: the critic rejects everything in a degraded state and stops the system. Human Escalation's is the silent timeout: no human responds, no default fires, work sits forever. Deferred Resume's is the lost event: the event broker drops the message and resume never happens. Each pattern's documentation should name its failure mode in plain English so on-call knows what to look for at 3am.

Choosing a pattern: decision rules

Known fixed steps? Sequential.
Pick one specialist from many? Router.
Decompose plus aggregate? Hierarchical Supervisor.
Parallel independent work? Swarm.
Many agents observing same state? Blackboard.
Need a veto for safety-critical actions? Sidecar Critic.
Policy gate required? Human Escalation.
Resume hours later? Deferred Resume.

FAQ

What is an AI agent handoff?: The contract by which one agent passes work to another. Specifies state, context, intent, return path, and the failure mode if the receiver is unavailable.
How do I choose between Sequential and Router?: Sequential when the steps are fixed and order matters. Router when one of several specialists should handle the request.
When should an agent escalate to a human?: Low confidence, high blast radius, ambiguity beyond training distribution, or a policy gate. The payload includes reasoning, proposed action, and one clear question.
What is a Swarm handoff?: Peer-to-peer agents on a shared workspace, no central orchestrator. Good for independent parallel work; bad if cross-agent conflicts are common.
What goes in the handoff payload?: State, context, intent, return path. All four. Missing any one silently drops or duplicates work.

How each framework maps to these patterns

LangGraph models the supervisor and router patterns natively, with explicit edges for handoffs and a state object that flows through (LangGraph, 2025). OpenAI's Agents SDK exposes handoffs as a first-class concept where an agent can directly delegate to another (OpenAI Agents SDK, 2025). CrewAI leans into hierarchical and sequential out of the box. Microsoft AutoGen ships group-chat semantics that approximate blackboard and swarm (Microsoft AutoGen, 2025). Anthropic's "Building Effective Agents" guidance pushes teams toward orchestrator-worker (a flavor of supervisor) as the safe default.

None of the frameworks implement all eight patterns equally. The right choice is which pattern your problem needs, then which framework implements that pattern best. Frameworks shift fast; patterns do not. Build against the pattern abstraction in your own code, then swap the framework when the next one ships.

Closing the loop

Pattern selection is design, not implementation detail. Get the contract right and any framework can host it. Get the contract wrong and the cleverest framework still loses work. Related: multi-agent systems overview, single vs multi-agent, and how to add a human approval step.

Sources

LangChain, "LangGraph multi-agent concepts", 2025, langchain-ai.github.io
OpenAI, "Agents SDK documentation", 2025, openai.github.io
Anthropic, "Building effective agents", 2024, anthropic.com
CrewAI, "Multi-agent collaboration", 2025, docs.crewai.com
Microsoft, "AutoGen multi-agent conversation", 2025, microsoft.github.io/autogen