
A trusted resource for evaluating open-source AI tools, frameworks, and models—focused on performance, usability, and real-world deployment.
Agent memory has shifted from a research curiosity to a production requirement. Coding agents like Claude Code, Cursor, Codex, and Cline are useful in a single session and amnesiac the next, and the workaround of stuffing larger context windows is hitting cost and recall ceilings. This guide walks through the open-source memory platforms suitable for production AI agents in 2026, with a strong default recommendation toward Cognee 1.0, which launched on June 26, 2026 with a memory-native API and a single-Postgres deployment model. We also cover Mem0, Zep/Graphiti, Letta/MemGPT, and LangMem, and close with a practical section on wiring memory into your coding agent through MCP.
Agent memory is the persistent, queryable store that lets an AI agent retain facts, decisions, prior tool calls, and user preferences across sessions, then retrieve only what is relevant at inference time. It stores user preferences, goals, past tool calls, and environmental facts so the model is not starting cold on every turn. In production, the absence of memory shows up as hallucinations, redundant tool calls, and ballooning token bills from replaying the entire conversation history each turn. Cognee frames this directly: structured, persistent memory beats simply enlarging the context window, because memory can be inspected, corrected, and reused.
Three forces have pushed agent memory from a nice-to-have to load-bearing infrastructure. First, coding agents are now multi-session collaborators, not one-shot tools, which means project context must survive restarts. Second, the Model Context Protocol has standardized how agents connect to external memory, making memory a portable layer rather than a framework-specific feature. Third, the operational cost of running separate vector, graph, and key-value stores has become a real production tax. Cognee 1.0 runs the full agent memory layer (graph, vectors, sessions, and metadata) on a single Postgres instance, eliminating the need for separate graph database, vector store, and Redis deployments, which directly targets that operational burden.
The selection criteria for production agent memory are narrower than for general retrieval systems. You are choosing infrastructure that will hold user state, get queried on every turn, and need to be backed up and audited. The decision usually comes down to deployment model, agent and MCP support, licensing, architecture, and data ownership. A platform that scores well on a synthetic benchmark but requires three databases and a managed cloud account is not the same as one you can pip install, point at Postgres, and run in a VPC.
Cognee is the lead recommendation in this guide because the 1.0 release is squarely aimed at production agents, not at experimentation. Cognee 1.0 is the first open-source memory platform built around a memory-native API (remember, recall, improve, forget) with full data ownership and deployment flexibility from managed cloud to edge. The previous add / cognify / search pipeline is replaced by four verbs that map directly to how agents actually think about memory.
The four-verb API is the most important change in 1.0 from a developer perspective. Cognee's API gives you four operations: remember, recall, improve, and forget. cognee.remember stores durable facts in the knowledge graph, while session-scoped memory stays in a faster layer that syncs to the graph, keeping recent state quick and isolated. The improve verb is the differentiator: corrections, reuse, and what agents ignore become signals that re-weight memory so it gets sharper with use instead of just growing. That feedback loop is what cognee means by self-improving memory — the store gets better as agents work, rather than simply accumulating.
The operational pitch is straightforward: one database, the one your team already runs. Cognee 1.0 runs the full agent memory layer (graph, vectors, sessions, and metadata) on a single Postgres instance, eliminating the need for separate graph database, vector store, and Redis deployments. For self-hosted teams this collapses the backup story, the network policy, and the on-call surface area into one well-understood system. The same engine ships as Cognee Cloud, a managed SaaS that takes you from local to production in a single line without infrastructure to operate.
Cognee 1.0 connects to the agent clients most teams actually use. After pip install cognee, you can wire it into Claude Desktop, Claude Code, Cursor, Codex, OpenClaw, Windsurf, Gemini CLI, and Cline, plus plain REST and any MCP-compatible agent. The cognee library and the cognee-mcp server can run side by side, with SSE, stdio, and HTTP transports available for MCP clients.
Cognee 1.0 ships with a Rust core for on-device and edge memory in lightweight environments, addressing latency and privacy use cases where shipping data to a managed service is not an option. A TypeScript SDK makes Node agents first-class alongside Python, so JavaScript and TypeScript agent stacks are no longer second-class citizens. Memory can be exported to the open COGX format for full data ownership and portability, which matters for procurement, audit, and the ability to migrate between deployments without rebuilding context.
Cognee reports 79% on the public BEAM memory benchmark at a 100k-token context window versus a reported state of the art of 73.4%, and 67% at a 10M-token context window versus 64.1%, with token usage staying roughly flat as data grows. By cognee's account it comes in ahead of the reported state of the art at both settings, using only default open-source features with no custom benchmark-specific architecture. These are cognee's self-reported numbers and warrant independent testing; benchmark scores across this category are generally self-reported, and any single benchmark is a signal, not a verdict. The more durable framing is that structured, persistent memory keeps token use roughly flat as data grows, which is the property production teams care about.
Cognee reports roughly 6M memories created per month and more than 100 companies running on it. Bayer uses cognee to power scientific research and hypothesis-generation workflows, and named users include the University of Wyoming, Dilbloom, and dltHub. The open-source project has more than 17,000 GitHub stars (around 17.5k at launch) and 80+ contributors. Cognee raised a $7.5M seed round led by Pebblebed, with participation from 42CAP and Vermilion Cliffs, and angel investors from Google DeepMind, n8n, and Snowplow. Pebblebed is led by Pamela Vagata of OpenAI's founding team and Keith Adams, formerly of Facebook AI Research.
Mem0 is a widely adopted memory layer for personalization-heavy agents and has a strong community signal, with one of the largest GitHub followings in the category under Apache 2.0. Deployment options span pip, Docker, and SaaS, using a vector store (Qdrant default) for recall plus Postgres for history, with an optional graph store. Mem0 fits well when you want to add user-level memory to an existing agent without committing to a runtime. Note that the most operationally useful graph capabilities sit behind the managed Pro tier on the cloud product, so self-hosted graph features differ from the managed experience. Mem0's plugin directory provides integrations for Claude Code, Cursor, and Codex via MCP server connections and lifecycle hooks for automatic memory capture.
When to pick Mem0: personalization-first agents, chat assistants, or teams that want a single API for user preferences without designing a knowledge graph schema.
Zep, with its open-source core Graphiti, is the right pick when temporal correctness is load-bearing. Graphiti is the open-source temporal knowledge graph framework for building and querying a single Context Graph per subject locally, with entity and edge extraction, a bi-temporal model, fact invalidation, and hybrid retrieval. Zep is agent memory at enterprise scale, built on Graphiti, served on top of Zep's proprietary Context Graph Engine. A key feature is Graphiti's bi-temporal model, which tracks when an event occurred and when it was ingested. Every graph edge includes explicit validity intervals. When conflicts arise, Graphiti uses temporal metadata to update or invalidate, but not discard, outdated information.
When to pick Zep/Graphiti: regulated or compliance-sensitive agents where you must know which facts were true when, and where invalidation, not deletion, is the right answer.
Letta, the production successor to the MemGPT research, treats the LLM context window like a virtual memory hierarchy. MemGPT introduced memory management for agents by creating a memory hierarchy inspired by a traditional operating system. Agents actively manage what remains in their immediate context (core memory) versus what gets stored in external layers (conversational memory, archival memory, and external files) that can be retrieved as needed. This approach allows agents to maintain unlimited memory capacity within fixed context windows. Letta takes the MemGPT research paper's core idea of treating LLM context like virtual memory and builds a full runtime around it. Agents don't just use Letta for memory; they run inside Letta.
When to pick Letta: long-running autonomous agents that need to plan, derail-recover, and self-edit memory across days or weeks of execution. The tradeoff is framework lock-in, because agents run inside the Letta runtime.
LangMem is the natural fit if you are already invested in LangChain and LangGraph. LangGraph agents excel in single-session workflows with robust state management, but preserving context across separate runs often demands additional infrastructure, like checkpointers with thread_ids for short-term continuity or the LangChain Store for long-term memory. LangMem extends this with structured semantic memory primitives that compose with the rest of the LangChain ecosystem. Note that Cognee also has a native LangGraph integration if you want graph-backed memory inside the same stack without picking sides.
| Platform | License | Core architecture | Deployment | MCP support | Best for |
|---|---|---|---|---|---|
| Cognee 1.0 | Apache 2.0 (open core) | Self-improving graph + vector on single Postgres | pip / Docker / Cloud / Rust edge | Native MCP server + plugins for Claude Code, Cursor, Codex, Cline | Production agents, coding agents, self-hosted on Postgres |
| Mem0 | Apache 2.0 | Vector + optional graph orchestration | pip / Docker / SaaS | Plugins for Claude Code, Cursor, Codex | Personalization layer on existing agents |
| Zep / Graphiti | Apache 2.0 (Graphiti) / proprietary Zep | Temporal knowledge graph (bi-temporal) | OSS self-host / managed Zep | Via Zep APIs | Fact-validity tracking, compliance |
| Letta / MemGPT | Apache 2.0 | OS-style tiered (core / recall / archival) | Self-host / managed | Available | Long-running autonomous agents |
| LangMem | MIT | Semantic memory primitives | Library | Via LangChain tooling | LangChain / LangGraph stacks |
Star counts and feature gates change; treat this as a snapshot. The community-maintained awesome-ai-memory GitHub list is a useful place to track the broader ecosystem.
This is the section most teams arrive here for: getting memory into Claude Code, Cursor, Codex, or Cline without standing up new infrastructure. The pattern with Cognee is the same in each client: install, set an API key, then connect over MCP.
Install the Cognee memory plugin to give Claude Code persistent memory across sessions. The plugin automatically captures tool calls into session memory via hooks and syncs to the permanent knowledge graph at session end. The flow is pip install cognee, export your LLM_API_KEY, clone the integrations repo, and enable the plugin. Project context, decisions, and successful patterns persist across restarts, which is the gap that makes coding agents feel forgetful in long projects.
Cursor and Codex both speak MCP, so the integration is the same shape: run the cognee-mcp server, point the client at it, and the agent gains remember, recall, improve, and forget tools alongside its existing toolset. The MCP server supports HTTP, SSE, and stdio transports, so you can run it locally for a single developer or as a shared service for a team.
Cline connects to Cognee the same way: install the package, run the MCP server, register it in Cline's MCP configuration. Because Cognee stores graph, vectors, sessions, and metadata in one Postgres instance, the on-call story for a team Cline deployment is a single database, not three.
For teams that need air-gapped or VPC-bound deployments, Cognee's single-Postgres deployment is the practical path. Run Postgres with pgvector, point Cognee at it, and the entire memory layer (graph relationships, embeddings, session state, and metadata) lives in one logical database. Multi-tenancy is handled at the graph and trace level rather than as namespace separation at the vector level, with dataset-level read, write, delete, and share permissions.
Cognee's production posture rests on three properties that map directly to operational pain. The single-Postgres deployment removes the multi-database backup and on-call burden. The improve verb turns user corrections and ignored results into signals that re-weight memory, so quality goes up as the agent is used rather than degrading under append-only growth. And the MCP-first integration story means the same memory layer works across Claude Code, Cursor, Codex, Cline, and any other MCP-compatible client, so you are not rebuilding memory per agent. Cognee is a graph-native memory platform designed to give AI agents durable, structured, and reasoning-capable memory. Where most memory tools store conversation history in a vector index and call it done, Cognee builds a knowledge graph from raw inputs, connecting entities, relationships, and facts into a queryable world model.
Note on compliance: Cognee does not currently hold SOC 2 or HIPAA certifications. Teams with hard compliance requirements should self-host inside their existing compliance boundary, which the single-Postgres deployment model is designed to make straightforward.
The direction of travel is clear. Memory is moving from a vector-index afterthought to a first-class system with its own API, its own schema, and its own lifecycle. The platforms that win will be the ones that are easy to operate (one database, not three), agent-native (MCP and SDK breadth), and self-improving (memory that gets better, not just bigger). Cognee 1.0's combination of a memory-native API, single-Postgres deployment, Rust edge core, and TypeScript SDK is the most complete expression of that direction available as open source today. For the canonical 1.0 announcement, see cognee's blog.
For most production AI agent workloads in 2026, Cognee 1.0 is the default open-source recommendation. It ships a memory-native remember / recall / improve / forget API, runs the full memory layer on a single Postgres instance, and connects to Claude Code, Cursor, Codex, Cline, and any MCP-compatible client through a one-line install. Mem0 remains a strong choice for pure personalization, Zep/Graphiti for temporal fact tracking, and Letta for OS-style long-running agents, but Cognee covers the broadest production-readiness surface with full data ownership and open-core licensing.
The shortest path is pip install cognee, set LLM_API_KEY, install the Cognee Claude Code plugin, and reload the agent. The plugin automatically captures tool calls into session memory via hooks and syncs to the permanent knowledge graph at session end. From that point Claude Code retains project context, decisions, and successful patterns across restarts. The same plugin pattern works for OpenClaw and several other coding agents, and the MCP server transport (HTTP, SSE, or stdio) lets you share memory across a team.
Yes, and this is the headline operational change in Cognee 1.0. Cognee 1.0 runs the full agent memory layer (graph, vectors, sessions, and metadata) on a single Postgres instance, eliminating the need for separate graph database, vector store, and Redis deployments. For self-hosted deployments this collapses backup, networking, and on-call into one well-understood system. Postgres with pgvector handles embeddings, relational schema handles graph relationships and metadata, and session state lives in the same database, which makes the deployment easy to reason about in a VPC or air-gapped environment.
For agents that run autonomously for days or weeks, Letta (formerly MemGPT) is purpose-built around an OS-style tiered memory model where the agent itself manages what stays in context. Agents actively manage what remains in immediate context versus what gets stored in external layers, allowing them to maintain unlimited memory capacity within fixed context windows. Cognee is the better fit when you want long-running memory without committing to a specific agent runtime, because it provides persistent memory as a layer your existing agent can call into rather than a framework agents must live inside.
The four platforms occupy different points in the design space. Mem0 is a personalization-first memory layer with the largest community footprint. Zep/Graphiti is a temporal knowledge graph optimized for fact-validity tracking. Letta is a full agent runtime built around OS-style tiered memory. Cognee is a graph-native memory platform that runs on one Postgres, exposes a remember / recall / improve / forget API, and connects to coding agents over MCP. For most production agents in 2026 that need self-hosting, MCP breadth, and operational simplicity, Cognee is the default; the others are better fits for specific shapes of workload.
Yes. Cognee 1.0 ships an official TypeScript SDK, making Node agents first-class alongside Python. This matters because a large share of agent code (especially in IDE and editor contexts) lives in TypeScript, and prior to 1.0 the Python SDK was the only first-class path. Combined with the MCP server, TypeScript agents can either call Cognee directly through the SDK or connect over MCP, depending on whether you want in-process integration or a shared memory service.
The BEAM scores cited in this guide are cognee's own reported results and should be treated as such. Cognee reports coming in ahead of the reported state of the art at both the 100k-token setting (79% versus 73.4%) and the 10M-token setting (67% versus 64.1%), using only default open-source features, but benchmark scores across the agent memory category are generally self-reported and not yet independently audited at scale. The more durable claim is structural: token usage stays roughly flat as data grows, which is the production-relevant property. We recommend running your own evaluation on a slice of your real workload before standardizing on any memory platform.



