All posts
· 12 min read ·

Anthropic and OpenAI shipped agent memory this month. They picked opposite architectures.

On April 23 Anthropic put filesystem-backed memory for Claude Managed Agents into public beta. OpenAI's Agents SDK update shipped vector-database memory with project, user, and policy tiers. Two frontier labs, one goal, opposite shapes — and the registry layer has no language yet for the trust properties that distinguish them.

Two of the three frontier labs shipped agent memory architectures within five weeks of each other this spring, and they look almost nothing alike. On April 23, 2026, Anthropic put persistent memory for Claude Managed Agents into public beta. Memories are mounted as files on a filesystem the agent reads and writes with the same bash and code-execution tools it uses for any other task. On the same arc, OpenAI updated its Agents SDK in April with configurable memory as a first-class architectural component: a vector database indexed by user, session, and agent identifiers, with retrieval through semantic similarity, keyword matching, and entity matching. Both architectures aim at the same problem and resolve it in different shapes.

The shape of that disagreement is the part that's worth pinning down, because the choices each lab made imply different things about what an agent's memory is for and how a registry layer should describe it.

The two architectures, side by side

Anthropic filesystem memory vs OpenAI vector-DB memory Two side-by-side architecture diagrams. The Anthropic side shows the Claude Managed Agent reading and writing memory as files on a mounted filesystem, using the same bash and code-execution tools as any other workflow. The OpenAI side shows the Agents SDK extracting facts during conversation and storing them in a vector database indexed by user, session, and agent identifiers, then retrieving relevant memories at the start of new sessions via semantic similarity. AGENT MEMORY · TWO ARCHITECTURES · APRIL–MAY 2026 Anthropic Claude Managed Agents · public beta Claude agent bash, file IO, code execution filesystem (mounted) /memory/notes-2026-04.md /memory/conventions.md /memory/incidents/2026-q2.md

Dreaming: scheduled background process that curates memory across sessions

PROPERTY · PORTABLE, INSPECTABLE, TEXTUAL TRADE-OFF · NO SEMANTIC RETRIEVAL BY DEFAULT

OpenAI Agents SDK · April 2026 update Agent run fact extraction + retrieval vector DB indexed by user · session · agent semantic similarity retrieval policy / project / user memory tiers

Memory injected into prompt at session start, ranked by retrieval relevance

PROPERTY · QUERY-EFFICIENT, STRUCTURED TRADE-OFF · OPAQUE TO INSPECTION

The architectures share a goal (give agents continuity across sessions) and a problem: neither one fits inside the model's context window, so both have to live somewhere external. The divergence is in what "somewhere external" looks like and what the agent can do with it.

What Anthropic chose

The Claude Managed Agents memory layer is, in Anthropic's framing, a filesystem mounted into the agent's runtime. A memory is a file. The agent uses the same tools it uses for any other task (read_file, write_file, bash, code execution) to interact with it. The content is whatever the agent and developer put there: notes, conventions, structured logs of past sessions, anything plain text or structured-text-on-disk can express.

The architectural bet is that agents are already good at reading and writing files, and that giving them memory in a format they already know how to handle is faster than building a separate memory API. The trade-off is that retrieval is whatever the filesystem gives you: by-path access, glob patterns, grep. There is no semantic retrieval primitive built into the memory layer itself, though Anthropic's added "Dreaming" background process reviews past sessions and reorganizes the memory store asynchronously to keep what matters and prune what doesn't.

The early production data Anthropic published cites Netflix, Rakuten, Wisedocs, and Ando among the beta adopters, with a stated 97% reduction in first-pass errors and 30% speed increase on document verification workflows. Those are real numbers from real customers; the architecture is plainly delivering value in production.

What OpenAI chose

The OpenAI Agents SDK memory architecture is a vector database. Facts get extracted from conversations during the agent run, stored with metadata identifying the user, session, and agent, and retrieved at the start of new sessions by semantic similarity, keyword matching, and entity matching. The memory tiers are explicit: Project memory (workspace or repo scope), User memory (preferences, style, recurring constraints), and Policy memory (compliance and safety constraints).

The architectural bet is the opposite of Anthropic's: that agents are not particularly good at managing their own memory and that a structured retrieval system performs better than asking the agent to grep through files. Memory is treated as data, not text, and it gets queried at the right moment by infrastructure that is purpose-built for the retrieval shape. The trade-off is opacity: a fact stored in a vector database is a vector, and inspecting what the agent "remembers" requires querying the database with the right intent.

OpenAI's framing of the design is explicit: "memory is intentionally managed, not accidental chat history sprawl." The SDK pushes developers to declare what kind of memory each piece of information belongs in. Project memory has different lifecycles than User memory has different lifecycles than Policy memory. The discipline is the product.

Where the difference actually matters

Three places the architectures diverge in ways consumers should care about:

  • Inspectability. A filesystem memory is auditable in the same way any other directory is: list the files, read them. A vector database memory requires a tool that knows how to project from vectors back to recognizable content. Anthropic's pattern is friendlier to compliance and to user-side data subject access requests; OpenAI's is friendlier to large-scale automated retrieval.
  • Portability. A user who wants to move from one Claude agent to another can copy the memory directory. A user who wants to move from one OpenAI agent to another needs to export from one vector database and re-embed in another; the embeddings depend on the model, so the transfer is lossy.
  • Multi-agent coordination. Anthropic's May 2026 multi-agent orchestration beta has lead agents delegate to specialists that share a common filesystem and memory. The filesystem is the coordination primitive. OpenAI's pattern has each agent run with its own retrieval against shared infrastructure, but the coordination layer is the SDK's harness, not the storage layer itself.

What this means for the registry layer

The Agenstry-relevant observation is that a consumer reading "this agent has persistent memory" is reading off a description that encodes meaningfully different trust properties depending on which architecture is underneath. A registry that surfaces only the present-tense claim ("memory: yes") is hiding the property the consumer most needs to know.

A registry-side description that mattered would name three things: the storage primitive (filesystem, vector database, or other), the inspection interface (what a user can see about what's stored), and the export shape (what a user can take with them when they leave). These three properties together describe the lock-in / portability profile of the agent. None of the current public registries catalogs MCP servers or A2A signed cards publishes them.

This matters more as agents persist user state at scale. The supply-chain attack surface of an agent with persistent memory is wider than that of a stateless agent — a compromised memory store is a long-term incident rather than a per-session one. The defenses are different. A user who trusts an agent with memory is making a longer commitment than a user who hires an agent for one session, and the registry layer is the place to publish the descriptors that make that commitment legible.

What we're watching

Three things, observable in the next two quarters:

  1. Whether a portability standard emerges across labs. A common export format for "what does this agent remember about me" would be the user-side analogue of GDPR-style data subject access requests. Today, each lab's memory is in a lab-specific format. The first cross-vendor schema for memory export will be a quiet but consequential normalization.
  2. Whether the OpenTelemetry GenAI Semantic Conventions absorb a memory-access attribute family. A gen_ai.memory.read/gen_ai.memory.write span attribute would let observability vendors trace what each agent run remembers about its user and surface inappropriate retention as a signal.
  3. Whether the third frontier lab picks a side, or a third architecture. Google's Gemini Agent absorbed Project Mariner's screen-handling code. Memory architecture is the next obvious axis for differentiation. Filesystem, vector database, or something else — the choice is the same architectural commitment Anthropic and OpenAI have already made, and the third answer will tell the field whether convergence or further divergence is the trajectory.

Two memory architectures, both in production within five weeks of each other, doing the same job in fundamentally different shapes. The lab choice is a registry-relevant signal that consumers should be able to read off a card; today, they have to read off the docs. That gap is the registry layer's next obvious piece of work.

Sources

← Back to blog Agenstry