AgentCore Memories: What Agents Remember and Why It Matters

1/4/2026Pradeep
AIAgentsAWSAgentCoreMemoryABA

AgentCore Memories: What Agents Remember and Why It Matters

Part 4 of a series on building production-scale agent platforms

In Part 1, we mapped the landscape of production agent hosting. In Part 2, we went inside the AgentCore runtime and deployment model. In Part 3, we covered how Gateway mediates every interaction between an agent and the outside world, enforcing security, observability, and rate limits at every boundary.

Those three layers handle execution, lifecycle, and connectivity. But there is something more fundamental that determines whether an agent feels like a useful tool or an amnesiac you have to re-brief every time you open a conversation.

Memory.

Memory is what transforms an agent from a stateless question-answering machine into something that feels like a collaborator. A BCBA who opens Lumen Health's progress report agent on Tuesday and says "let's continue working on Marcus's quarterly report" expects the agent to know who Marcus is, what they discussed on Monday, which sections of the report are drafted, and which data questions remain open. That expectation is so basic that it barely registers as a technical requirement. But delivering on it at scale, across hundreds of organizations, with HIPAA-grade data isolation and auditability, is one of the hardest problems in agent infrastructure.

This article is about how AgentCore handles memory. It is also, inevitably, about what memory means for an agent's identity, its continuity, and its trustworthiness. Because memory is not a feature you bolt on. It shapes what the agent is.

Memory Is Not Just Vector Storage

Let me address the most common misconception first. When engineers hear "agent memory," most of them immediately think of vector databases. Embed the conversation, store it in Pinecone or pgvector, retrieve relevant chunks at inference time. Done.

That is retrieval. It is an important component of memory, but it is not memory itself. Retrieval answers the question "what information is relevant to this query?" Memory answers a different, broader set of questions: "what has happened before, what did we decide, what do I know about this user, what context carries forward into the future, and what should I forget?"

A vector store does not know the difference between something the user said casually and something they asked you to remember. It does not distinguish between a fact that was true last month and a fact that was corrected yesterday. It does not understand that a piece of information is relevant to one conversation but private to another. It is a similarity index. Memory requires judgment, structure, and policy on top of that index.

AgentCore's memory system is built around this distinction. It provides the storage and retrieval primitives, yes, but it also provides the scaffolding for scoping, structuring, retaining, and expiring memories in ways that align with how agents actually need to use context.

The Four Dimensions of Agent Memory

Agent memory is not a single thing. It operates across four dimensions, and each requires different infrastructure, different retention policies, and different access controls.

Session Memory (Short-Term, Structured)

This is the conversation buffer. What has the user said in this interaction? What has the agent responded? What tool calls have been made and what did they return? Session memory is the most obvious kind and the easiest to implement. It is essentially the conversation history that gets appended to the prompt context on each turn.

But even session memory has production complications. Context windows are finite. A long conversation with many tool calls can easily exceed the model's context limit. You need a strategy for what happens when that limit approaches. AgentCore provides three approaches:

Sliding window. Keep the most recent N turns and drop older ones. Simple, but you lose context that might be relevant. A BCBA who discussed mastery criteria in turn 3 and then asks about them again in turn 30 will get an agent that has forgotten the earlier discussion.

Summarization. Periodically compress older turns into a summary that captures the key decisions and context. This preserves the important information while reducing token count. The tradeoff is that summarization itself requires a model call, which adds latency and cost, and the summary might miss nuances that turn out to be important later.

Selective retrieval. Store all turns in a session index and retrieve only the turns that are relevant to the current query. This is the most sophisticated approach and the one that scales best for long conversations, but it introduces retrieval latency and the risk of missing context that the retrieval model does not identify as relevant.

For Lumen Health's progress report agent, the typical session spans 10 to 15 turns over a 20-minute interaction. Sliding window works fine for most cases. But for complex reports where the BCBA asks the agent to revise multiple sections across a longer conversation, Lumen uses summarization to compress earlier turns while preserving the revision decisions.

Episodic Memory (Short-to-Medium-Term, Semi-Structured)

Episodic memory captures what happened across recent sessions with the same user or the same context. It bridges the gap between "what happened in this conversation" and "what do I know in general."

When a BCBA opens a new session with Lumen's agent and says "let's pick up Marcus's report from yesterday," episodic memory is what makes that possible. It stores the key facts from the prior session: which report was being worked on, which sections were completed, which items were flagged for review, and what the BCBA's feedback was.

Episodic memory is not the full conversation transcript. It is the distilled, structured output of a prior interaction. Think of it as the notes you would take at the end of a meeting. You do not transcribe everything that was said. You write down what was decided, what needs to happen next, and what context the next meeting will need.

In AgentCore, episodic memory is stored as structured records with metadata: the user ID, the session timestamp, the agent version, and a set of key-value pairs or short text summaries extracted from the session. These records are indexed for retrieval and scoped to the user-agent pair, meaning a BCBA's episodic memories for one client's report agent do not leak into their interactions about a different client.

Semantic Memory (Long-Term, Unstructured)

This is the closest to what most people think of when they hear "agent memory." Semantic memory is the agent's accumulated knowledge about a user, a domain, or a context, stored as embeddings and retrieved by similarity.

For a customer support agent (to use a non-healthcare example for a moment), semantic memory might include: this customer prefers email over phone, they had a billing dispute resolved in March, they are on the enterprise plan, they have expressed frustration about response times in the past. None of this was stated in the current conversation. It was accumulated over months of interactions and is retrieved when it becomes relevant.

For Lumen Health, semantic memory at the per-client level might include: this learner responds well to natural environment teaching, their parent has expressed concerns about generalization to the home setting, the prior BCBA noted that the client has stronger skills in the morning, and the most recent assessment showed a spike in barriers scores. This is not session data (which the agent retrieves via tools). This is clinical context that the agent builds up over repeated interactions, forming a richer picture of the client over time.

AgentCore's semantic memory uses a managed vector store with automatic embedding and retrieval. You configure:

  • Embedding model. Which model to use for encoding memories. This affects retrieval quality and should be chosen to match the domain vocabulary. Clinical text has different semantic characteristics than customer support text.
  • Retrieval strategy. Pure similarity search, hybrid search (combining vector similarity with keyword matching), or filtered search (restricting retrieval to memories matching specific metadata criteria like organization or client ID).
  • Relevance threshold. Memories below a certain similarity score are not retrieved, preventing the agent from being influenced by tangentially related context that would confuse rather than help.

Procedural Memory (Long-Term, Structured)

This is the least discussed but possibly most important category. Procedural memory captures how the agent should behave, not what it knows. It includes learned preferences, calibrated behaviors, and organizational conventions.

For Lumen Health, procedural memories might include: this organization prefers reports structured with quantitative data before narrative interpretation, this BCBA likes concise bullet points rather than paragraphs, this payer requires specific language around medical necessity. These are not facts about clients. They are patterns about how to do the work, learned through feedback and reinforcement.

Procedural memory is typically stored as structured configuration or as few-shot examples that get injected into the prompt. In AgentCore, you can define memory templates that specify which procedural memories to retrieve and where to place them in the prompt context. A template might say: "always include the organization's preferred report format, the BCBA's style preferences if available, and the payer's documentation requirements for the client's funding source."

Memory Templates and Configuration

Memory templates are how you tell AgentCore what to remember, how to store it, and when to retrieve it. A template defines:

Extraction rules. What should be extracted from each session and stored as memory? This can be as simple as "store the full conversation history" or as nuanced as "extract any client preferences mentioned, any clinical decisions made, and any follow-up items identified." Extraction can be rule-based (regex patterns, keyword triggers) or model-based (using a smaller, cheaper model to summarize and extract structured data from the conversation).

Storage schema. What structure should memories have? For episodic memories, this might be a JSON schema with fields for session date, client ID, report section, and status. For semantic memories, it might be free-text with metadata tags. For procedural memories, it might be a preference key-value pair.

Retrieval triggers. When should memories be retrieved? At the start of every session? Only when the user references a prior interaction? When the agent is about to generate a specific type of output? Retrieval triggers determine when the memory system is consulted, and unnecessary retrievals add latency and cost.

Injection placement. Where in the prompt context should retrieved memories appear? Before the system prompt? After the system prompt but before the conversation history? Interleaved with the conversation? Placement affects how much weight the model gives to memories relative to other context, and the right placement varies by use case.

For Lumen Health, the memory template for the progress report agent looks something like this:

  • Session start: Retrieve episodic memories for this BCBA + client pair from the last 30 days. Retrieve procedural memories for this BCBA's style preferences and the client's payer documentation requirements. Inject after the system prompt.
  • During generation: Retrieve semantic memories for the client when generating interpretation sections (to provide longitudinal clinical context). Inject inline with the relevant report section.
  • Session end: Extract any new clinical observations, BCBA feedback on the agent's output, and follow-up items. Store as episodic memories. If the BCBA corrected the agent's interpretation, store the correction as a procedural memory to avoid repeating the mistake.

Multi-Tenant Memory Separation

This is where memory gets genuinely hard for platforms like Lumen Health that serve hundreds of organizations.

Memory must be isolated at the organization level. A BCBA at Organization A must never see memories derived from Organization B's interactions, even if both organizations serve clients with similar profiles. This is not just a privacy concern. It is a legal requirement under HIPAA, and in some states, under additional behavioral health data protection laws.

AgentCore enforces tenant isolation at the memory store level. Each memory record is tagged with an organization identifier, and all queries are scoped by that identifier. But the isolation needs to go deeper than query-time filtering. The embedding index itself must prevent cross-tenant leakage. If Organization A and Organization B's memories are stored in the same vector index, a sufficiently crafted query might retrieve semantically similar memories from the wrong tenant, even with metadata filtering, depending on the vector store's consistency guarantees.

AgentCore addresses this with logical partitioning of the memory store. Each organization gets its own partition (effectively its own index) within the managed vector store. Queries never cross partition boundaries. This is slightly more expensive (more partitions means more index overhead) but eliminates the cross-tenant retrieval risk entirely.

Within an organization, memory is further scoped by user and by context (client, session type, etc.). A BCBA can access memories related to clients on their caseload but not memories from a colleague's sessions with a different client. These access rules mirror the same RBAC policies that Gateway enforces for tool access, creating a consistent authorization model across the entire agent infrastructure.

Memory Poisoning: The Risk Nobody Talks About

Here is a threat model that most teams do not consider until it is too late.

An agent's memories influence its behavior. Memories are constructed from conversation history, which is influenced by user input. If a user can craft input that causes the agent to store a malicious or misleading memory, that memory will affect the agent's behavior in future sessions, potentially for other users.

Imagine a scenario with Lumen's agent. A user with access to the system crafts a conversation that causes the agent to store a procedural memory like "always recommend discontinuing services when a client shows regression." That memory, retrieved in future sessions, could subtly bias the agent's clinical recommendations in a harmful direction. The agent does not know the memory is adversarial. It treats it as learned context, just like any other memory.

This is memory poisoning, and it is the agent equivalent of a persistent cross-site scripting attack. The malicious payload is not code. It is context that corrupts the agent's reasoning in future interactions.

Defenses include:

  • Memory validation. Before storing a new memory, run it through a validation step that checks for anomalous content, policy-violating instructions, or patterns that match known injection techniques.
  • Memory provenance. Tag every memory with the user and session that created it. If a memory causes an issue, you can trace it back to its source and remove it.
  • Memory review. For high-stakes domains like healthcare, implement a periodic review process where a human reviews recently created memories, especially procedural ones, for accuracy and safety.
  • Memory expiration. Memories should not live forever by default. Stale memories can be wrong (the client's circumstances changed), outdated (the clinical protocol was updated), or simply no longer relevant. Expiration policies force periodic refreshment.
  • User-scoped memory boundaries. Memories created from one user's sessions should not affect another user's experience unless explicitly configured. This limits the blast radius of any poisoning attempt.

AgentCore supports all of these through its memory lifecycle configuration. You can define validation hooks (Lambda functions that run before a memory is persisted), provenance tracking (automatic metadata on every memory record), TTL policies (per-memory-type expiration), and scoping rules (which users can create memories that affect which other users).

Compliance: HIPAA, GDPR, and the Right to Be Forgotten

Memory creates compliance obligations that do not exist for stateless systems.

Under HIPAA, memories that contain or are derived from protected health information must be treated with the same security controls as any other PHI. They must be encrypted at rest and in transit, access must be logged, and they must be included in the organization's data inventory. When a client leaves an organization or requests data deletion, their memories must be purged. Not just the memories that contain their name, but any memory that was derived from their session data, even if the memory itself has been summarized to the point where the client is not directly identifiable.

Under GDPR (relevant for behavioral health organizations that serve clients in the EU, or for SaaS platforms with European customers), the right to erasure extends to agent memories. If a user requests deletion of their data, you must be able to identify and delete all memories associated with that user, across all agents that may have interacted with them.

AgentCore's memory system supports compliance through:

  • Encryption. All memory records are encrypted at rest using KMS keys, with per-organization key management.
  • Audit logging. Every memory read, write, and deletion is logged to CloudTrail.
  • Retention policies. Configurable TTLs per memory type, with automatic expiration and deletion.
  • Bulk deletion API. Delete all memories matching a set of criteria (organization, user, client, date range). This supports right-to-erasure requests and client offboarding workflows.
  • Data inventory integration. Memory records are tagged with data classification labels that integrate with your broader data governance tooling.

For Lumen Health, the compliance workflow for client departure looks like: when a client is discharged or transfers to another organization, an automated workflow triggers the bulk deletion API to remove all memories associated with that client across all agents. The deletion is logged, and a compliance report is generated showing which memories were deleted, when, and by which automated process.

Cost Implications of Memory at Scale

Memory is not free, and the costs are less obvious than they first appear.

Storage costs are the easy part. Vector storage costs a few dollars per million embeddings. For most organizations, even with generous memory retention, storage costs are negligible compared to model inference costs.

Embedding costs are more significant. Every memory that gets stored needs to be embedded. Every query that triggers retrieval needs to be embedded. If your agent stores 10 memories per session and retrieves 5 per session start, and you are running thousands of sessions per day, the embedding costs add up. Using a smaller, cheaper embedding model (like Titan Embeddings instead of Cohere's large model) reduces per-call costs but may reduce retrieval quality.

Retrieval latency costs are the hidden tax. Every memory retrieval adds latency to the agent's response time. For Lumen's agent, retrieving episodic and procedural memories at session start adds 100-200ms. Retrieving semantic memories during report generation adds another 50-100ms per retrieval. These are small numbers individually, but they compound across the agent's execution and are felt by the clinician waiting for the report.

Inference costs are the big one. Every retrieved memory that gets injected into the prompt increases the token count, which increases the cost of every model call for the rest of the session. If you retrieve 2,000 tokens of memory context and the agent makes 6 model calls during report generation, you are paying for 12,000 additional input tokens per session. Across thousands of sessions per day, that is material.

The optimization strategy is to be selective about what gets remembered and what gets retrieved. Not every conversation turn needs to become a memory. Not every session start needs to retrieve the full memory set. Memory templates should be tuned to the tradeoff between context richness and cost, and that tuning should be informed by data. Track which retrieved memories actually influence the agent's output (by comparing outputs with and without specific memories) and prune the ones that do not contribute.

Memory Defines the Agent

Let me end with something slightly more philosophical than the rest of this series.

Memory defines the personality and continuity of an agent. Two agents with identical code, identical prompts, and identical tools but different memories will behave differently. They will have different knowledge, different calibrations, different learned preferences. In a meaningful sense, they will be different agents.

This has implications that extend beyond infrastructure. When you deploy a new agent version (as we discussed in Part 2), what happens to the memories? Do they carry forward? They should, usually, because the accumulated knowledge is valuable. But the new version might interpret memories differently than the old version. A prompt restructuring might cause the agent to weight certain memories more heavily, subtly changing its behavior not because the memories changed but because the agent's relationship to them changed.

When you A/B test two agent versions (also from Part 2), they share the same memory store. If version A creates a memory during a session, and the user is later routed to version B, version B inherits that memory. This is usually desirable (continuity for the user) but can confound your A/B test results (version B's behavior is influenced by version A's memory decisions).

When you purge memories for compliance (as we discussed above), you are not just deleting data. You are changing the agent's behavior for every future interaction that would have been influenced by those memories. The agent becomes slightly different, slightly less knowledgeable, slightly less calibrated. For a single client's deletion, this is negligible. For a bulk purge (an entire organization leaving the platform), it can be significant.

These are not technical problems with technical solutions. They are design decisions about what kind of agent you want to build and how you want it to relate to its own history. The infrastructure must support whatever decisions you make, but it cannot make them for you.

For Lumen Health, the decision is that memory should serve clinical continuity. The agent should feel like a knowledgeable colleague who remembers past conversations, understands the client's history, and learns from the BCBA's feedback. But that colleague's knowledge is bounded by policy: scoped to authorized clients, expiring with retention policies, and erasable when clinical or legal requirements demand it. Memory is powerful precisely because it persists. And anything that persists must be governed.

What Comes Next

We have covered what agents remember, how that memory is structured and stored, how it gets retrieved and injected, and the compliance and cost implications of memory at scale. Memory gives agents continuity and context. It is what makes them useful beyond a single interaction.

But memory without identity control is dangerous. If the agent remembers everything about a client but cannot verify who is asking, the memory becomes a liability rather than an asset. If the agent accumulates procedural knowledge from one BCBA's feedback but applies it to another BCBA's sessions without appropriate scoping, the memory corrupts rather than calibrates. Identity, authentication, and policy enforcement are not just security features. They are the governance layer that makes memory safe to use.

Next: Identity and Policies in AgentCore.


This is Part 4 of a series on production-scale agent hosting. Part 1 covered the problem space. Part 2 covered runtime and deployments. Part 3 covered Gateway and tool integration. Part 5 will cover identity, authentication, and policy enforcement.