The intuition that "more agents will do better than one agent" is wrong more often than it is right. Most production multi-agent systems exist because the work has genuine boundaries (different access controls, different tools, different models), not because two LLMs are smarter than one. This guide is the operational playbook for the four coordination patterns that actually ship: supervisor, peer, market, and shared-state.

For the conceptual primer on multi-agent design, see multi-agent systems explained and single-agent vs multi-agent. This is the patterns-and-trade-offs counterpart.

When to go multi-agent

Real boundaries that justify multi-agent.

Different access scopes. The "billing" agent needs PCI data; the "support" agent must not. Splitting agents lets you enforce the boundary at the access layer rather than relying on the model not to ask.

Different tool sets. One specialist needs a code interpreter; another needs CRM tools; another needs document retrieval. Loading all tools into one agent inflates every call and hurts tool-selection accuracy.

Different models per role. A reasoning agent uses a larger model; a classification agent uses a small fast one; a code agent uses a code-tuned model. The cost arbitrage works only with multiple agents.

Parallel work. Three independent subtasks can run in parallel. A single agent serializes; a multi-agent system runs them at once.

Fake reasons that do not justify multi-agent: "agents collaborating sounds cool", "the prompt is getting long" (rewrite the prompt; do not split the agent), "multi-agent demos rank higher on Twitter".

Supervisor pattern

A controller agent (the supervisor) is the only agent the user interacts with. The supervisor reads the request, decides which specialist agents to call, dispatches subtasks, and integrates the results.

Strengths. Easy to debug; the supervisor's trace is the trace of the system. Centralized policy: the supervisor enforces what tasks go where. Easy to add specialists incrementally.

Weaknesses. The supervisor is a bottleneck for latency. Every interaction goes through it. If the supervisor's prompt is wrong, every downstream call is downstream-wrong.

When to pick it. Default for most multi-agent systems. LangGraph documents this pattern explicitly with a Python tutorial (LangChain, 2024). OpenAI's Practical Guide to Building Agents (2024) frames supervisor as the entry point for moving from single to multi.

Peer / handoff pattern

Agents at the same level pass control to each other directly. The current agent decides whether to handle the message or hand off to a different agent. No central controller.

Strengths. Models work as stages with clean ownership; a sales agent hands to onboarding when the lead converts; onboarding hands to support after activation. Each agent stays specialized.

Weaknesses. Harder to debug: the "trace" is the sequence of handoffs and there is no central view. Risk of handoff cycles (A hands to B which hands back to A) without explicit cycle detection.

When to pick it. Stage-based workflows with clear transitions. OpenAI's Swarm framework (2024) is built around this pattern with a minimal handoff API (openai/swarm GitHub).

Market / bidding pattern

Agents compete to handle a task. A router collects bids (or capability signals), picks the best agent, and dispatches. The "bid" can be a confidence score, a self-evaluation, or a price.

Strengths. Self-balancing: agents that are better at certain tasks naturally win those tasks. Easier to add new agents because they declare what they can do rather than being wired into routing.

Weaknesses. Expensive: every candidate agent processes the task at least far enough to produce a bid. Bidding can be gamed by an over-confident agent. Coordination overhead grows quickly with agent count.

When to pick it. When you genuinely do not know in advance which specialist is right and the cost of asking all of them is acceptable. Rare in practice; usually a supervisor with good routing is cheaper and clearer.

Shared-state pattern

Agents read and write a common workspace (a document, a database, a blackboard). Coordination emerges through state changes rather than direct messages.

Strengths. Works well when agents are long-running and asynchronous: a research agent populates a document with sources while a writing agent drafts sections of it. Each agent works at its own pace.

Weaknesses. Race conditions: two agents writing the same field need explicit conflict resolution. Hard to debug: the "interaction" is implicit in state changes, not in a trace. State growth needs governance.

When to pick it. Long-running collaborative work with natural shared artifacts (a document, a knowledge graph, a deal record). LangGraph's state graph can implement this; Anthropic's Building Effective Agents discusses orchestrator-worker plus shared scratchpad as a variant (Anthropic, 2024).

Choosing between patterns

PatternBest forAvoid when
SupervisorDefault; tasks decompose cleanly into specialist callsLatency-critical (supervisor adds a hop) or fully parallel work
Peer / handoffStage-based workflows; clear ownership transitionsTasks that loop or need central oversight
Market / biddingTruly unknown routing; abundant computeCost-sensitive; predictable task distribution
Shared-stateLong-running collaborative artifactsShort tasks; high coordination volume

Cost discipline

Multi-agent systems cost 2 to 5x more in tokens for the same work because each agent inflates its own prompt and the orchestration layer adds intermediate model calls. Three controls keep this manageable.

Cache aggressively. The supervisor and each specialist have stable prompts; cache them. Anthropic prompt caching cuts cached-portion input cost by up to 90 percent (Anthropic, 2024).

Route to model size. The supervisor often needs a stronger model; specialists handling narrow tasks can use smaller, cheaper models with eval gates.

Cap orchestration depth. Limit how deeply the supervisor can recurse (max sub-agent calls per top-level request). Without a cap, an ambiguous request can fan out into 20+ agent calls.

For broader cost work, see AI agent cost optimization.

Debugging multi-agent systems

Multi-agent systems are harder to debug than single-agent systems because the failure mode is often "the system as a whole did the wrong thing" rather than "agent X made a wrong call." Three practices help.

Unified trace. Every agent in the system writes to one shared trace, scoped by a parent run_id and child step_id. The trace reads top to bottom as the system's reasoning. Without a unified trace, debugging requires correlating traces from each agent manually, which is the work that consumes incident response time.

Replay at the system level. A trace can be replayed not just per agent but for the whole system: the same user request, the same tool results, the same handoff sequence. The replay catches bugs that depend on the interaction pattern, not just a single agent's prompt.

Adversarial drills. Once a quarter, inject artificial failures into one agent and observe the system's behavior. Does the supervisor recover when a specialist returns an error? Does the peer pattern handle a handoff to an agent that is offline? The drill catches resilience gaps that production has not yet exposed.

Setting boundaries between agents

The boundary between two agents is the most important design decision in any multi-agent system. Three heuristics inform it.

Decision authority. Each agent should own a decision that no other agent can override. If two agents both need to decide whether to refund a customer, you have one agent with extra steps.

State ownership. Each agent should have a clear set of state it reads and writes. Shared write access to the same field is a race condition waiting to happen.

Interface contracts. Agents communicate through typed messages with declared schemas. Free-form text between agents loses information and amplifies hallucination across the chain.

Frequently asked questions

What are the main multi-agent coordination patterns?

Supervisor, peer / handoff, market / bidding, and shared-state.

When should I use multiple AI agents instead of one?

When the work has real boundaries: different access scopes, different tool sets, different models, or parallelizable subtasks.

What is the supervisor pattern in multi-agent systems?

A controller agent delegates subtasks to specialists and integrates results. The supervisor is the only agent the user interacts with directly.

What is the peer / handoff pattern in multi-agent systems?

Agents at the same level pass work to each other. Useful for stage-based workflows with clear ownership transitions.

Do multi-agent systems cost more than single agents?

Yes, usually 2 to 5x in tokens. Worth it when specialization improves quality or when parallel work speedup matters.

Three things to ship this week

Sources