The single hardest non-model problem in agent engineering is state. The model is stateless. Every other part of the system that gives it the illusion of continuity, of memory, of resumption, is your code. Get it wrong and the agent forgets the user, restarts the task from scratch, or worse, replays a destructive tool call.
This piece is about state design for production agents. It defines the three tiers most agents need, the storage choices that match each tier, and the patterns that make agents survive restarts. For the conceptual primer on what memory is, see AI agent memory explained. This is the operational layer below that.
The three tiers of agent state
Production agents have at minimum three separate state stores. Treating them as one is the most common architectural mistake.
| Tier | Lifetime | Contains | Storage |
|---|---|---|---|
| Scratchpad | One run | Tool calls, intermediate results, reasoning | In-memory or fast KV |
| Memory | Across runs | Facts about users, prior tasks, learned preferences | Database + vector store |
| Checkpoint | Until run completes | Serialized agent state at a known-good point | Durable transactional store |
Scratchpad
The scratchpad is the agent's working memory within a single run. It holds tool calls made, results returned, and any intermediate reasoning. It dies with the run.
Two rules. Keep it bounded. A long-running agent that appends every tool result to the scratchpad eventually exceeds the model's context window. Truncate or summarize when the scratchpad approaches 70 percent of the context budget. Make it observable. The scratchpad is your debugging window. Log every entry with a timestamp and a trace ID. When an agent does the wrong thing, the scratchpad is what you read to find out why. For the broader practice, see how to debug an agent that did the wrong thing.
Memory
Memory holds what the agent has learned about a user or task across runs. Three patterns dominate.
Verbatim memory
Store every prior message. Cheap to build; expensive to use because every call carries the full history. Works for short-lived sessions, breaks at scale.
Summary memory
Compress past sessions into a running summary. The agent reads the summary on every call instead of the verbatim history. Cheaper at the token level, loses detail. OpenAI's Assistants API exposes this pattern explicitly via thread summarization (OpenAI Assistants API, 2024).
Hybrid memory
Verbatim within the current session, summary plus vector retrieval across sessions. This is the production default. The agent reads the current session in full, the prior-session summary always, and retrieves specific facts via vector search when the user references prior context. LangGraph's checkpointer plus memory store implements this pattern (LangGraph memory, 2024).
Checkpoints
A checkpoint is a serialized snapshot of agent state at a known-good point in a run. If the host crashes or the operator pauses the run, the agent resumes from the most recent checkpoint instead of starting over.
Where to checkpoint. Before any destructive tool call. If the call succeeds and the host crashes before recording the result, the next run must not replay the call. After expensive computation. Long-running tool results (a 10-second database query, a 30-second LLM call on a different model) should land in a checkpoint so retry does not re-incur the cost. At user-approval boundaries. When an agent waits for a human to approve a step, the wait can be hours. The checkpoint lets the agent host be recycled freely.
What goes in a checkpoint. The minimum is the scratchpad contents, the current step in the plan, and the identities of any pending tool calls. The OpenAI Assistants API runs and LangGraph checkpointers both serialize equivalents (OpenAI, LangGraph).
Storage choices
Matching tier to storage is the architectural decision that pays dividends or generates incidents.
Scratchpad: in-memory (process heap), Redis, Cloudflare KV. Speed matters; durability does not. If the run dies, the scratchpad dies with it.
Memory: a database the agent can query plus a vector store. Postgres + pgvector covers both for most agents under 1 million users. Dedicated vector stores (Pinecone, Weaviate, Qdrant) start to matter above that scale. Memory entries should be tenant-scoped at the schema level, not just by query parameter, to prevent cross-tenant leaks.
Checkpoints: a durable transactional store. Postgres works for most agents. Cloudflare Durable Objects fit when you want the agent state and the orchestrator co-located. Avoid Redis-only for checkpoints unless you have AOF persistence and a tested recovery procedure; the snapshot model in default Redis loses recent writes on crash.
Restart safety
An agent that does not survive a restart is a demo, not a product. Two patterns make restart safety work.
Idempotency keys on every external write. Generate a deterministic key per intended action (for example, a hash of "agent-run-id + step-number + payload"). Pass it to the downstream API's idempotency-key header. If the call succeeded but the response was lost, the retry returns the prior result instead of executing again. Stripe, GitHub, and most SaaS APIs support idempotency keys explicitly.
Two-phase commit for multi-step writes. When the agent needs to coordinate writes to two systems (say Salesforce + Slack notification), use a saga pattern: each write produces a compensating action that can roll it back. The checkpoint stores both the forward and compensating actions. On restart, the agent inspects the checkpoint and either completes the unfinished write or runs the compensating action.
For more on graceful rollback, see how to roll back an agent action and AI agent error handling and rollback.
Common mistakes
One store for everything. Putting scratchpad, memory, and checkpoints in the same Redis instance with no persistence config. The first crash erases everything.
Memory growth unbounded. Every user interaction adds a row, nothing ever gets pruned. The token cost on every model call grows linearly with the user's lifetime. Cap per-user memory at 4 to 8 KB of structured facts; older entries get compressed or dropped.
Checkpoints without restoration tests. The team writes the checkpointer but never tests resume from a real crash. The first incident exposes a serialization bug that nobody noticed in dev.
Scratchpad bleed between users. A scratchpad keyed only by session ID without a tenant scope allows one user's run to read another's data in a misrouted request. Always include tenant ID in scratchpad keys.
Field notes from production
Three patterns worth highlighting from running agents in production.
Checkpoint compaction. Without periodic compaction, the checkpoint store grows linearly with run count. Build a daily job that removes checkpoints older than the longest in-flight task. Without it, the checkpoint table is the largest table in the database within a year.
Cross-region restore. If your agent runs in multiple regions and a region goes down, can the other region restore the checkpoint? If checkpoints are stored only in-region, the answer is no. For multi-region production agents, replicate checkpoint state across regions or rely on a globally consistent durable store (Postgres with logical replication, Cloudflare Durable Objects with cross-region storage).
Memory schema migrations. The shape of stored memory changes when the agent gains new capabilities. Treat memory like a database schema: version it, write migrations, and back-fill on read for old entries. Without versioning, an agent that loaded old-shaped memory crashes silently or misinterprets fields.
Frequently asked questions
What kinds of state does an AI agent need to manage?
Scratchpad (current run), memory (across runs), and checkpoints (resume after crash). Each tier needs different storage.
What is a checkpoint in the context of AI agents?
A serialized snapshot of agent state at a known-good point. The agent resumes from the checkpoint instead of restarting after a crash.
Where should AI agent state be stored?
Scratchpad in fast KV. Memory in a database plus vector store. Checkpoints in a durable transactional store.
How do AI agents handle conversation memory across sessions?
Hybrid memory: verbatim within session, summary plus vector retrieval across sessions. This is the production default.
How big should an agent's memory be?
Cap per-user memory at 4 to 8 KB of structured facts. Compress or drop entries older than 90 days unless the user pins them.
Three things to ship this week
- Separate your scratchpad, memory, and checkpoint storage. One store for all three is the single most common bug.
- Add idempotency keys to every external write the agent makes.
- Run a chaos drill: kill the agent process mid-run, confirm it resumes from the checkpoint, confirm no destructive call replays.
Sources
- LangChain, "LangGraph Persistence", 2024, langchain-ai.github.io
- LangChain, "LangGraph Memory concepts", 2024, langchain-ai.github.io
- OpenAI, "Assistants API how-it-works", 2024, platform.openai.com
- Anthropic, "Building Effective Agents", 2024, anthropic.com
- Stripe, "Idempotent requests", docs.stripe.com