A handoff is the contract between two agents (or one agent and a human) that specifies what gets passed, when, and what happens if the receiver is unavailable. Eight patterns cover most production cases. LangGraph, OpenAI Agents SDK, CrewAI, and Anthropic's subagent model all implement subsets of these patterns; the patterns themselves are framework-independent (LangGraph multi-agent, 2025; OpenAI Agents SDK, 2025).
The hard part is rarely the pass itself. It is the intent preservation and the failure mode. The patterns below give you the names, the payload contracts, the failure modes, and a verification test for each one.
The handoff contract: what every pattern shares
Four fields. State: the data the next agent needs to act, structured and typed. Context: the conversational history, ideally compressed to a relevance summary rather than full transcript. Intent: what to accomplish, stated as an outcome not a workflow ("refund the customer", not "open Stripe, find the charge, click refund"). Return path: how to resume the caller if the receiver completes, errors, or escalates further. Without all four, a multi-agent system silently drops work or duplicates it.
Pattern 1: Sequential handoff (the pipeline)
Intent. Steps run in a fixed order; each agent's output is the next agent's input. Payload. State plus context plus intent for the next step. Failure mode. If a downstream agent is unavailable, queue the work and notify the previous step. Verification. Run the pipeline with one downstream offline; the work should queue, not vanish.
Best for known multi-step workflows: extract, then validate, then summarize, then send. Worst for branching logic; that is the Router's job.
Pattern 2: Router handoff (the dispatcher)
Intent. A small classifier agent picks the right specialist based on the request. Payload. State plus intent; specialists pull context themselves. Failure mode. If no specialist matches, fall to a default handler or escalate. Verification. Feed inputs designed to be ambiguous and confirm the fallback fires, not silent guessing.
Pattern 3: Hierarchical Supervisor (manager and reports)
Intent. A supervisor agent decomposes a task, dispatches subtasks to specialists, aggregates results. LangGraph's hierarchical multi-agent docs codify this pattern (LangGraph, 2025). Payload. Full subtask brief per specialist; specialists return structured results to the supervisor. Failure mode. Specialist down → supervisor re-routes or escalates. Verification. Kill a specialist mid-run; supervisor should detect and re-plan.
Pattern 4: Swarm (peer-to-peer with shared state)
Intent. Peer agents share a workspace and pick up work autonomously. No central orchestrator. Payload. Workspace state, claim-key per work item to prevent collisions. Failure mode. An agent crashing mid-task should release its claim within a TTL. Verification. Crash one peer; the claim should auto-release and another peer should pick it up.
Pattern 5: Blackboard (shared workspace)
Intent. Multiple agents post observations and partial results to a common board; a coordinator (or another agent) reads and acts. Classical AI pattern from Hayes-Roth, reborn for LLM agents. Payload. Structured updates with timestamps and authorship. Failure mode. Stale or conflicting board entries should be timed out. Verification. Insert a stale entry; coordinator should ignore it past TTL.
Pattern 6: Sidecar Critic (parallel evaluator)
Intent. A critic agent runs in parallel with the primary, evaluates the primary's proposed action, can veto or annotate. Common for safety-critical paths. Payload. Proposed action plus reasoning trace. Failure mode. Critic down should default to denying high-risk actions, allowing low-risk ones. Verification. Disable critic; high-risk actions should be blocked, not allowed by default.
Pattern 7: Human Escalation (the gate)
Intent. Agent reaches a gate, pauses, asks a human, resumes on response. Payload. Reasoning trace so far, the proposed action, a single clear question, deadline by which a default fires if no human responds. Failure mode. No human response by deadline → either the safe-default action fires or the work is queued indefinitely; choose per policy. Verification. Run an escalation against an empty inbox; the deadline behavior should match policy.
Three rules for escalation payloads. The question must be answerable with a single decision. The reasoning trace must be skimmable in under 30 seconds. The deadline must be explicit, not implicit.
Pattern 8: Deferred Resume (asynchronous handoff)
Intent. Agent A completes, fires an event, agent B (or A itself, later) picks up and continues hours or days later. Payload. Durable state, event ID, resume hint. Failure mode. Event lost or duplicated; idempotency keys plus a dead-letter queue. Verification. Duplicate the event; the second consumer should detect and no-op.
Designing the payload itself
The payload is the contract. Five rules that hold across patterns. One: type the payload. JSON schema or a typed struct beats a free-form dictionary. A typed payload makes silent drift loud; a free-form one makes silent drift silent. Two: pass intent as outcome, not workflow. "Refund the customer for order X" survives schema changes; "click the refund button in Stripe for order X" does not. Three: include a return path on every handoff. Even pipelines that look one-way have failure cases that need to bubble back; treating return as optional turns every failure into a debugging archaeology dig.
Four: compress context, do not transcribe it. The receiver does not need the full conversation; it needs the decisions, the constraints, and the open questions. A compressed summary of 200 tokens beats a transcript of 5,000 every time on quality and on cost. Five: include a deadline. Every handoff implicitly has one; making it explicit lets the receiver plan its own timeouts and lets a downstream supervisor detect stuck work without polling.
Failure modes per pattern
Each pattern has a signature failure mode worth knowing in advance. Sequential's is the silent stall: a downstream agent down means the work queues; if no one is watching the queue, work piles up invisibly. Router's is the misroute: a specialist gets work it cannot do and silently degrades quality. Supervisor's is the cascade: the supervisor itself fails and takes the whole hierarchy with it. Swarm's is the conflict: two peers claim the same work and produce divergent outputs.
Blackboard's is the stale entry: an old observation poisons new reasoning. Sidecar Critic's is the false veto: the critic rejects everything in a degraded state and stops the system. Human Escalation's is the silent timeout: no human responds, no default fires, work sits forever. Deferred Resume's is the lost event: the event broker drops the message and resume never happens. Each pattern's documentation should name its failure mode in plain English so on-call knows what to look for at 3am.
Choosing a pattern: decision rules
- Known fixed steps? Sequential.
- Pick one specialist from many? Router.
- Decompose plus aggregate? Hierarchical Supervisor.
- Parallel independent work? Swarm.
- Many agents observing same state? Blackboard.
- Need a veto for safety-critical actions? Sidecar Critic.
- Policy gate required? Human Escalation.
- Resume hours later? Deferred Resume.
FAQ
- What is an AI agent handoff?
- The contract by which one agent passes work to another. Specifies state, context, intent, return path, and the failure mode if the receiver is unavailable.
- How do I choose between Sequential and Router?
- Sequential when the steps are fixed and order matters. Router when one of several specialists should handle the request.
- When should an agent escalate to a human?
- Low confidence, high blast radius, ambiguity beyond training distribution, or a policy gate. The payload includes reasoning, proposed action, and one clear question.
- What is a Swarm handoff?
- Peer-to-peer agents on a shared workspace, no central orchestrator. Good for independent parallel work; bad if cross-agent conflicts are common.
- What goes in the handoff payload?
- State, context, intent, return path. All four. Missing any one silently drops or duplicates work.
How each framework maps to these patterns
LangGraph models the supervisor and router patterns natively, with explicit edges for handoffs and a state object that flows through (LangGraph, 2025). OpenAI's Agents SDK exposes handoffs as a first-class concept where an agent can directly delegate to another (OpenAI Agents SDK, 2025). CrewAI leans into hierarchical and sequential out of the box. Microsoft AutoGen ships group-chat semantics that approximate blackboard and swarm (Microsoft AutoGen, 2025). Anthropic's "Building Effective Agents" guidance pushes teams toward orchestrator-worker (a flavor of supervisor) as the safe default.
None of the frameworks implement all eight patterns equally. The right choice is which pattern your problem needs, then which framework implements that pattern best. Frameworks shift fast; patterns do not. Build against the pattern abstraction in your own code, then swap the framework when the next one ships.
Closing the loop
Pattern selection is design, not implementation detail. Get the contract right and any framework can host it. Get the contract wrong and the cleverest framework still loses work. Related: multi-agent systems overview, single vs multi-agent, and how to add a human approval step.
Sources
- LangChain, "LangGraph multi-agent concepts", 2025, langchain-ai.github.io
- OpenAI, "Agents SDK documentation", 2025, openai.github.io
- Anthropic, "Building effective agents", 2024, anthropic.com
- CrewAI, "Multi-agent collaboration", 2025, docs.crewai.com
- Microsoft, "AutoGen multi-agent conversation", 2025, microsoft.github.io/autogen
