AI Agent State Management: Memory, Checkpoints, and Durability

Q: What kinds of state does an AI agent need to manage?

Three tiers. The scratchpad holds the current run's tool calls, intermediate results, and reasoning trace. Memory holds facts the agent learned about a user or task across runs. Durable checkpoints hold the state needed to resume an interrupted long-running task after a restart. Most agent failures around persistence come from confusing these three tiers.

Q: What is a checkpoint in the context of AI agents?

A checkpoint is a serialized snapshot of an agent's state at a known-good point in a long-running task. If the host restarts, the agent resumes from the most recent checkpoint instead of starting over. LangGraph documents checkpointing explicitly as a mechanism for human-in-the-loop pauses and crash recovery (LangChain LangGraph docs, 2024).

Q: Where should AI agent state be stored?

Scratchpad in memory or a fast key-value store (Redis, Cloudflare KV) since it dies with the run. Memory in a database the agent can query (Postgres, vector store, or both). Checkpoints in a durable store with strict consistency (Postgres, Cloudflare Durable Objects). Mixing these tiers, like storing checkpoints in Redis without persistence, is the most common production mistake.

Q: How do AI agents handle conversation memory across sessions?

Three patterns. Verbatim history (store every message) is cheap to build but expensive to use at the token level. Summary memory (compress past sessions into a running summary) saves tokens but loses detail. Hybrid memory (verbatim within a session, summary plus vector retrieval across sessions) is the production default and is the approach used by frameworks like LangGraph and the OpenAI Assistants API (OpenAI, 2024).

Q: How big should an agent's memory be?

As small as the task allows. The cost of carrying memory is paid on every model call as input tokens. A working pattern: cap per-user memory at 4 to 8 KB of structured facts plus a vector store for retrieval. Anything older than 90 days gets compressed or dropped unless the user explicitly pins it.

The single hardest non-model problem in agent engineering is state. The model is stateless. Every other part of the system that gives it the illusion of continuity, of memory, of resumption, is your code. Get it wrong and the agent forgets the user, restarts the task from scratch, or worse, replays a destructive tool call.

This piece is about state design for production agents. It defines the three tiers most agents need, the storage choices that match each tier, and the patterns that make agents survive restarts. For the conceptual primer on what memory is, see AI agent memory explained. This is the operational layer below that.

What is state management in AI agents?

State management in AI agents is the practice of tracking everything an agent needs to stay coherent: conversation context, task progress, and intermediate results, both across the steps of a single run and across separate sessions. Because the underlying language model is stateless, the agent's code must persist that information explicitly. The main approaches fall into three groups. In-context or short-term memory keeps the current run's messages and tool results inside the model's context window, often in a scratchpad. External stores, such as vector databases, key-value caches, or summary records, hold long-term memory that survives across sessions and is retrieved when relevant. Checkpointing serializes the agent's full state at known-good points so a long-running task can resume after a crash or pause instead of restarting. Done well, state management makes agents reliable (no forgotten context), resumable (no lost work after a restart), and cheaper (bounded memory means fewer input tokens on every model call).

The three tiers of agent state

Production agents have at minimum three separate state stores. Treating them as one is the most common architectural mistake.

Tier	Lifetime	Contains	Storage
Scratchpad	One run	Tool calls, intermediate results, reasoning	In-memory or fast KV
Memory	Across runs	Facts about users, prior tasks, learned preferences	Database + vector store
Checkpoint	Until run completes	Serialized agent state at a known-good point	Durable transactional store

Scratchpad

The scratchpad is the agent's working memory within a single run. It holds tool calls made, results returned, and any intermediate reasoning. It dies with the run.

Two rules. Keep it bounded. A long-running agent that appends every tool result to the scratchpad eventually exceeds the model's context window. Truncate or summarize when the scratchpad approaches 70 percent of the context budget. Make it observable. The scratchpad is your debugging window. Log every entry with a timestamp and a trace ID. When an agent does the wrong thing, the scratchpad is what you read to find out why. For the broader practice, see how to debug an agent that did the wrong thing.

Memory

Memory holds what the agent has learned about a user or task across runs. Three patterns dominate.

Verbatim memory

Store every prior message. Cheap to build; expensive to use because every call carries the full history. Works for short-lived sessions, breaks at scale.

Summary memory

Compress past sessions into a running summary. The agent reads the summary on every call instead of the verbatim history. Cheaper at the token level, loses detail. OpenAI's Assistants API exposes this pattern explicitly via thread summarization (OpenAI Assistants API, 2024).

Hybrid memory

Verbatim within the current session, summary plus vector retrieval across sessions. This is the production default. The agent reads the current session in full, the prior-session summary always, and retrieves specific facts via vector search when the user references prior context. LangGraph's checkpointer plus memory store implements this pattern (LangGraph memory, 2024).

Checkpoints

A checkpoint is a serialized snapshot of agent state at a known-good point in a run. If the host crashes or the operator pauses the run, the agent resumes from the most recent checkpoint instead of starting over.

Where to checkpoint. Before any destructive tool call. If the call succeeds and the host crashes before recording the result, the next run must not replay the call. After expensive computation. Long-running tool results (a 10-second database query, a 30-second LLM call on a different model) should land in a checkpoint so retry does not re-incur the cost. At user-approval boundaries. When an agent waits for a human to approve a step, the wait can be hours. The checkpoint lets the agent host be recycled freely.

What goes in a checkpoint. The minimum is the scratchpad contents, the current step in the plan, and the identities of any pending tool calls. The OpenAI Assistants API runs and LangGraph checkpointers both serialize equivalents (OpenAI, LangGraph).

Storage choices

Matching tier to storage is the architectural decision that pays dividends or generates incidents.

Scratchpad: in-memory (process heap), Redis, Cloudflare KV. Speed matters; durability does not. If the run dies, the scratchpad dies with it.

Memory: a database the agent can query plus a vector store. Postgres + pgvector covers both for most agents under 1 million users. Dedicated vector stores (Pinecone, Weaviate, Qdrant) start to matter above that scale. Memory entries should be tenant-scoped at the schema level, not just by query parameter, to prevent cross-tenant leaks.

Checkpoints: a durable transactional store. Postgres works for most agents. Cloudflare Durable Objects fit when you want the agent state and the orchestrator co-located. Avoid Redis-only for checkpoints unless you have AOF persistence and a tested recovery procedure; the snapshot model in default Redis loses recent writes on crash.

Restart safety

An agent that does not survive a restart is a demo, not a product. Two patterns make restart safety work.

Idempotency keys on every external write. Generate a deterministic key per intended action (for example, a hash of "agent-run-id + step-number + payload"). Pass it to the downstream API's idempotency-key header. If the call succeeded but the response was lost, the retry returns the prior result instead of executing again. Stripe, GitHub, and most SaaS APIs support idempotency keys explicitly.

Two-phase commit for multi-step writes. When the agent needs to coordinate writes to two systems (say Salesforce + Slack notification), use a saga pattern: each write produces a compensating action that can roll it back. The checkpoint stores both the forward and compensating actions. On restart, the agent inspects the checkpoint and either completes the unfinished write or runs the compensating action.

For more on graceful rollback, see how to roll back an agent action and AI agent error handling and rollback.

Common mistakes

One store for everything. Putting scratchpad, memory, and checkpoints in the same Redis instance with no persistence config. The first crash erases everything.

Memory growth unbounded. Every user interaction adds a row, nothing ever gets pruned. The token cost on every model call grows linearly with the user's lifetime. Cap per-user memory at 4 to 8 KB of structured facts; older entries get compressed or dropped.

Checkpoints without restoration tests. The team writes the checkpointer but never tests resume from a real crash. The first incident exposes a serialization bug that nobody noticed in dev.

Scratchpad bleed between users. A scratchpad keyed only by session ID without a tenant scope allows one user's run to read another's data in a misrouted request. Always include tenant ID in scratchpad keys.

Field notes from production

Three patterns worth highlighting from running agents in production.

Checkpoint compaction. Without periodic compaction, the checkpoint store grows linearly with run count. Build a daily job that removes checkpoints older than the longest in-flight task. Without it, the checkpoint table is the largest table in the database within a year.

Cross-region restore. If your agent runs in multiple regions and a region goes down, can the other region restore the checkpoint? If checkpoints are stored only in-region, the answer is no. For multi-region production agents, replicate checkpoint state across regions or rely on a globally consistent durable store (Postgres with logical replication, Cloudflare Durable Objects with cross-region storage).

Memory schema migrations. The shape of stored memory changes when the agent gains new capabilities. Treat memory like a database schema: version it, write migrations, and back-fill on read for old entries. Without versioning, an agent that loaded old-shaped memory crashes silently or misinterprets fields.

Frequently asked questions

What kinds of state does an AI agent need to manage?

Scratchpad (current run), memory (across runs), and checkpoints (resume after crash). Each tier needs different storage.

What is a checkpoint in the context of AI agents?

A serialized snapshot of agent state at a known-good point. The agent resumes from the checkpoint instead of restarting after a crash.

Where should AI agent state be stored?

Scratchpad in fast KV. Memory in a database plus vector store. Checkpoints in a durable transactional store.

How do AI agents handle conversation memory across sessions?

Hybrid memory: verbatim within session, summary plus vector retrieval across sessions. This is the production default.

How big should an agent's memory be?

Cap per-user memory at 4 to 8 KB of structured facts. Compress or drop entries older than 90 days unless the user pins them.

What is the difference between agent memory and agent state?

State is the umbrella term for everything an agent tracks: the scratchpad for the current run, memory across runs, and checkpoints for crash recovery. Memory is one tier of state, the part that persists facts about users and prior tasks between runs. In practice the distinction matters for storage: memory lives in a queryable database plus vector store, while the other tiers use fast or transactional stores.

How do AI agents remember previous conversations?

By persisting conversation state outside the model, which forgets everything between calls. The production default is hybrid memory: the current session is passed verbatim, past sessions are compressed into a running summary that is always loaded, and specific older facts are fetched on demand via vector search. On the next message, the agent's code assembles these pieces into the prompt so the model appears to remember.

What happens to an AI agent's state when it fails mid-task?

The scratchpad is lost with the process. Without checkpoints, the run restarts from scratch and risks replaying tool calls that already executed, including destructive ones. With checkpoints, the agent reloads the last known-good snapshot, inspects which steps completed, and resumes from there. Idempotency keys on external writes make the resume safe: a retried call returns the prior result instead of executing twice.

Three things to ship this week

Separate your scratchpad, memory, and checkpoint storage. One store for all three is the single most common bug.
Add idempotency keys to every external write the agent makes.
Run a chaos drill: kill the agent process mid-run, confirm it resumes from the checkpoint, confirm no destructive call replays.

Sources

LangChain, "LangGraph Persistence", 2024, langchain-ai.github.io
LangChain, "LangGraph Memory concepts", 2024, langchain-ai.github.io
OpenAI, "Assistants API how-it-works", 2024, platform.openai.com
Anthropic, "Building Effective Agents", 2024, anthropic.com
Stripe, "Idempotent requests", docs.stripe.com