AgentCore Identity and Policies: Why Least Privilege Matters More for Agents
AgentCore Identity and Policies: Why Least Privilege Matters More for Agents
Part 5 of a series on building production-scale agent platforms
We have covered a lot of ground in this series. Part 1 laid out why production agent hosting is a different category of problem. Part 2 went inside the runtime and deployment model. Part 3 covered how Gateway mediates every tool interaction. Part 4 explored memory, the thing that gives agents continuity and context.
Each of those layers assumes that someone, somewhere, has answered a foundational question: who is this agent, what is it allowed to do, and on whose behalf is it acting?
That question sounds simple. It is not. Agent identity is one of those problems that looks like a configuration task on the surface and reveals itself as an architectural challenge once you start pulling at the edges. The reason is that agents occupy an uncomfortable middle ground in your security model. They are not human users (they do not authenticate interactively, they do not have judgment about what they should and should not access). They are not traditional service accounts (they take dynamic, non-deterministic actions based on reasoning, not predefined code paths). They are something new: autonomous actors that make real-time decisions about which tools to call and which data to access, influenced by user input that you cannot fully predict.
This article is about how you build an identity and policy model for that kind of actor, using AgentCore's IAM integration. And because the stakes are clearest in regulated environments, I will use two running examples: Lumen Health (our ABA behavioral health platform from earlier in the series) and a new one, Meridian Capital, a mid-size financial institution deploying agents for loan processing and compliance reporting.
Agent Identity Is Not Human Identity
When a human user accesses a system, the identity model is straightforward. The user authenticates (proves who they are), the system authorizes (checks what they are allowed to do), and every action is attributed to that identity. The user has intent. They know what they are doing and why. If they access something they should not, they are accountable.
Agents break this model in several ways.
Agents act on behalf of others. When Lumen's progress report agent queries a client's session data, it is not acting on its own behalf. It is acting on behalf of the BCBA who initiated the report. The authorization check should not be "is this agent allowed to see session data?" It should be "is this BCBA allowed to see this client's session data, and is this agent authorized to act on behalf of this BCBA for this purpose?"
Agents make decisions about what to access. A traditional service follows a fixed code path. It accesses exactly the data its code specifies. An agent decides at runtime, based on the model's reasoning, which tools to call and what parameters to pass. The set of data it accesses is not predetermined. You can define the boundaries (which tools are available, what schemas are valid), but within those boundaries, the agent exercises judgment. And that judgment can be influenced by the user's input, including adversarial input.
Agents compose actions. A single agent invocation might chain together five or six tool calls, each of which could access different services with different sensitivity levels. The agent might read client session data (PHI), query a payer's documentation requirements (non-PHI business data), and then write a report to the document store (creating new PHI). Each action in that chain has different authorization implications, and the chain itself was not predetermined. It emerged from the agent's reasoning.
These characteristics mean you cannot treat agent identity like user identity (too much trust, too little predictability) or like service identity (too rigid, cannot adapt to the dynamic nature of agent actions). You need a model that combines the delegation semantics of user identity with the policy enforcement mechanisms of service identity.
How AgentCore Handles Identity
AgentCore builds on IAM, which means agent identity uses the same primitives (roles, policies, trust relationships) that your organization already understands. But the way those primitives are composed is specific to the agent use case.
Agent Execution Roles
Every agent version runs under an IAM execution role. This role defines the ceiling of what the agent can do. It is the maximum permission boundary, regardless of who is invoking the agent or what the agent decides to do at runtime.
For Lumen Health's progress report agent, the execution role might allow:
- Invoke Bedrock models (specific model IDs only)
- Read and write to the agent's memory store (specific DynamoDB table, scoped by partition key)
- Publish to specific EventBridge event buses
- Call specific API Gateway endpoints via Gateway (the internal session data, assessment, and treatment plan APIs)
The role does not allow:
- Direct access to any database (all data access goes through Gateway)
- Access to billing or financial systems
- Access to any AWS service not explicitly listed
- Cross-account actions (unless explicitly configured via trust relationships)
This is a permission boundary, not a permission grant. The execution role defines the outer boundary. Actual authorization decisions happen at a finer grain, based on the invoking user's identity.
User Identity Propagation
When a BCBA invokes the progress report agent, their identity flows through the entire execution chain. AgentCore captures the user's identity token at the entry point and propagates it as context through every tool call.
This is where things get architecturally interesting. The agent does not assume the user's IAM role. That would be dangerous, because the user might have broader permissions than the agent should exercise. Instead, the agent operates under its own execution role, but every tool call through Gateway carries the user's identity as a claim. Gateway uses that claim to make authorization decisions.
When the agent calls the session data API for client Marcus, Gateway checks:
- Is the agent's execution role allowed to call this API? (Role-level check)
- Is the invoking user (the BCBA) associated with the organization that owns this client's data? (Tenant-level check)
- Is the invoking user authorized to view this client's clinical records? (RBAC check)
- Is this specific data field included in the response filter for the agent's tool definition? (Data-level check)
All four checks must pass. If any one fails, Gateway rejects the request. The agent receives a policy rejection error (as we discussed in Part 3), and it can tell the user that it does not have access to that data without revealing why.
The Intersection Matrix
Think of the authorization model as a matrix with three dimensions:
- Agent capabilities (what the agent is technically able to do, defined by the execution role and tool definitions)
- User permissions (what the invoking user is allowed to do, defined by their RBAC role and organizational scope)
- Context constraints (what is appropriate given the current request, defined by policies like "this agent version is limited to read-only operations during canary deployment")
The effective permission for any given action is the intersection of all three. The agent can only do something if it has the capability, the user has the permission, and the context allows it. This intersection model is what prevents both over-privileged agents (an agent that can access data no user asked it to) and over-privileged users (a user who tries to get the agent to access data they should not see).
Tool-Level Permissions: Granularity Matters
Defining permissions at the tool level, not just the service level, is critical for agents. Let me illustrate with Meridian Capital.
Meridian is deploying a loan processing agent that assists underwriters. The agent needs to:
- Pull applicant credit reports (read-only, from a third-party bureau API)
- Query internal risk models (read-only, from an internal scoring service)
- Look up regulatory requirements by state (read-only, from a compliance database)
- Generate underwriting recommendations (write, to the loan management system)
- Flag applications for manual review (write, to the review queue)
A naive approach would give the agent a single execution role with access to all five services. But the risk profile of each action is different. Reading a credit report is sensitive but bounded. Writing an underwriting recommendation creates a record that may have legal standing. Flagging for manual review triggers a workflow that involves human effort and deadlines.
AgentCore lets you define permissions at the tool level, not just the service level:
Tool: pull_credit_report
- Allowed callers: Users with role "underwriter" or "senior_underwriter"
- Rate limit: 50 per user per day (to comply with credit bureau usage agreements)
- Data classification: PII, requires audit logging
- Response filter: Strip raw FICO model details, return only the score and key factors
Tool: generate_recommendation
- Allowed callers: Users with role "senior_underwriter" only (junior underwriters can view recommendations but the agent cannot generate them on their behalf)
- Requires confirmation: The agent must present its recommendation and receive explicit user approval before writing to the loan system
- Idempotency: Enforced per application ID to prevent duplicate recommendations
- Audit: Full input/output logging including the agent's reasoning chain
Tool: flag_for_review
- Allowed callers: Any user with role "underwriter" or above
- Side effect: Creates a time-sensitive task in the review queue
- De-duplication: If the application is already flagged, return the existing flag rather than creating a duplicate
This granularity means the security team can review each tool's permission profile independently. They do not need to understand the agent's reasoning logic. They just need to verify that each tool has appropriate access controls, rate limits, and audit requirements. This is an important property for regulated environments: the security review can be decomposed into reviewable units.
Environment Isolation: Dev, Stage, and Prod
Agents need the same environment isolation patterns as any other service, but with additional considerations.
The standard pattern: separate AWS accounts for development, staging, and production. Each account has its own AgentCore deployment, its own Gateway configuration, its own memory stores, and its own tool targets. An agent running in the development account connects to development versions of internal APIs and uses synthetic data. An agent running in production connects to production APIs and real data.
What makes this harder for agents:
Memory leakage across environments. If an engineer tests the agent in staging with real-looking data, and that data persists in staging's memory store, it could affect future tests in unexpected ways. More critically, if someone accidentally connects a staging agent to a production memory store (a misconfiguration that is easier than it sounds), the staging agent starts retrieving production data. AgentCore enforces environment isolation at the memory store level, with separate stores per environment and no cross-environment references in the configuration.
Tool endpoint confusion. An agent that is configured to call a production API in staging will produce realistic-looking outputs but is now accessing production data from an environment with weaker access controls. AgentCore's Gateway configuration is per-environment, and the tool endpoint mappings are resolved at the environment level, not embedded in the agent code. The agent code references logical tool names. The environment-specific Gateway configuration resolves those names to environment-specific endpoints.
Evaluation drift. The evaluation datasets and scoring criteria that gate promotion from staging to production need to reflect the production environment. If staging evaluations run against synthetic data but production data has different characteristics (which it always does), an agent that passes staging evaluation may still behave differently in production. This is not strictly an identity problem, but environment isolation strategy directly affects it.
For Meridian Capital, the environment setup is:
- Dev account: Synthetic applicant data, mock credit bureau responses, relaxed rate limits, full debug logging. Engineers can test with any agent version.
- Staging account: Anonymized production data (real loan applications with PII stripped), real credit bureau test endpoints, production-equivalent rate limits, full audit logging. Only CI/CD pipelines deploy here, no manual deployments.
- Production account: Real data, real integrations, strict rate limits, full audit logging with compliance retention. Only promoted artifacts from staging can be deployed, and only through the automated pipeline with approval gates.
Each account has its own IAM trust boundaries. A role in the dev account cannot assume a role in the production account. There is no path for a development agent to accidentally reach production data.
Cross-Account Patterns
Some architectures require an agent in one account to access resources in another. Meridian's compliance reporting agent runs in the agent platform account but needs to read data from the lending system's production account.
AgentCore supports this through IAM cross-account role assumption. The agent's execution role in the platform account has permission to assume a specific role in the lending account. That assumed role grants access only to the specific API endpoints and data stores the agent needs. The trust relationship is bidirectional: the platform account's role must be allowed to assume the lending account's role, and the lending account's role must trust the platform account's role.
The important constraint is that cross-account roles should be as narrowly scoped as the tool-level permissions. Do not create a cross-account role that grants broad read access to the lending account. Create a role that grants access to exactly the API paths and data fields the agent needs, and nothing more.
Preventing Runaway Agents
A runaway agent is one that enters a reasoning loop that causes it to call tools excessively, consume disproportionate resources, or take actions that amplify beyond what the user intended. This is not a theoretical risk. It happens when agents encounter unexpected input, hit error conditions they are not designed to handle, or get caught in reasoning cycles where each step triggers another step.
The defense is layered:
Token budgets. AgentCore lets you set a maximum token budget per execution. When the budget is exhausted, the execution is terminated regardless of the agent's state. This prevents infinite reasoning loops from consuming unlimited model inference costs.
Tool call limits. A maximum number of tool calls per execution. If the agent hits the limit, it must return a response with whatever information it has gathered, or return an error indicating it could not complete the task within the allowed scope.
Time limits. A maximum wall-clock time per execution. Long-running agents that are waiting on slow tool responses will be terminated if they exceed the limit.
Cost limits. Per-user and per-organization cost limits that aggregate across all agent invocations. If a single user or organization exceeds their allocation, subsequent invocations are throttled or rejected until the budget resets.
Anomaly detection. AgentCore monitors tool call patterns per agent version. If a version suddenly starts making 10x more tool calls than its historical baseline, the platform can alert the operations team and optionally throttle the version automatically. This catches bugs in new agent versions that made it through evaluation but behave differently on production traffic.
For Meridian Capital, the guardrails look like:
- Maximum 20 tool calls per loan processing invocation (the agent typically uses 8 to 12)
- Maximum 60 seconds wall-clock time
- Maximum $2 model inference cost per invocation
- Per-underwriter daily budget of $50 in total agent costs
- Automatic alert if any agent version exceeds 150% of its trailing 7-day average tool call rate
Audit Trails and Governance
In regulated industries, the audit trail is not a nice-to-have. It is the foundation of your compliance posture. For agents, the audit trail needs to capture more than traditional access logs because the decision-making process itself is part of what auditors want to understand.
AgentCore produces three levels of audit data:
Control plane audit (CloudTrail). Who created this agent version? Who modified the deployment configuration? Who changed the Gateway policy? Who updated the memory retention settings? These are the governance events that track who changed the agent's capabilities over time.
Data plane audit (CloudTrail + X-Ray). Which user invoked the agent? Which tools did it call? What data did it access? What did it write? These are the access events that track what the agent did on behalf of which user.
Reasoning audit (X-Ray traces). What was the model's reasoning at each step? Why did the agent choose to call tool A instead of tool B? What information from memory influenced the decision? These are the judgment events that explain why the agent did what it did.
For Meridian Capital, the compliance team needs all three layers. When a regulator asks "why did the agent recommend approval for this loan application?" Meridian needs to produce:
- The agent version that was active at the time (control plane)
- The data the agent accessed: credit report, risk score, regulatory requirements (data plane)
- The reasoning chain: how the agent weighed the risk factors, what threshold it applied, and why it concluded the application met the criteria (reasoning audit)
Without all three, you cannot provide a complete explanation. And in financial services, an unexplainable automated decision is a compliance violation waiting to be discovered.
Governance Frameworks
For organizations that need formal governance around their agent deployments, the identity and policy infrastructure provides the building blocks for a governance framework:
Agent registry. A catalog of all deployed agents, their current versions, their execution roles, their tool permissions, and their deployment configurations. This is the starting point for any governance review. You need to know what agents exist and what they can do.
Change management. All changes to agent configuration (new versions, policy changes, tool additions, memory configuration updates) go through a review and approval process. The review can be automated for low-risk changes (prompt wording adjustments) and manual for high-risk changes (adding a new tool that accesses financial data).
Periodic access reviews. Just like user access reviews, agent access reviews ensure that each agent's permissions are still appropriate. Does the agent still need access to that API? Is the rate limit still calibrated correctly? Has the organizational structure changed in a way that affects the RBAC model? These reviews should happen quarterly at minimum.
Incident response. When something goes wrong (an agent accesses data it should not have, produces an incorrect recommendation, or enters a runaway loop), the incident response process should include: immediate containment (rollback to a safe version or disable the agent), root cause analysis (trace the execution to understand what happened), remediation (fix the policy, update the guardrails, improve the evaluation suite), and post-incident review (update the governance framework to prevent recurrence).
For Lumen Health, the governance framework maps to their existing HIPAA compliance program. The agent registry feeds into their system inventory. Change management aligns with their change control board process. Periodic access reviews happen alongside their annual security assessments. Incident response follows their existing breach notification procedures.
For Meridian Capital, the governance framework maps to their OCC examination readiness program. The agent registry is part of their model inventory (yes, agents count as models for regulatory purposes). Change management includes model risk management review. Periodic access reviews align with their SOX compliance cycles.
The governance framework does not need to be invented from scratch. It needs to extend your existing governance processes to cover agents as a new category of automated actor. The identity and policy infrastructure provides the technical enforcement. The framework provides the organizational process.
The Architectural Takeaway
Agent identity is not a bolt-on security feature. It is the load-bearing structure that makes everything else in the agent stack safe to use. Memory without identity is a data leak. Tool access without identity is an uncontrolled attack surface. Deployment without identity is an ungovernable system.
The key design principles:
-
Agents operate under their own identity, separate from both the user's identity and the platform's service identity. This keeps the authorization model clean and auditable.
-
User identity propagates through the execution, enabling fine-grained authorization decisions at every tool call. The agent acts on behalf of the user, but within the agent's own permission boundaries.
-
Authorization is the intersection of agent capability, user permission, and context constraints. No single dimension is sufficient. All three must be evaluated together.
-
Tool-level permissions allow security reviews to be decomposed into reviewable units, each with its own risk profile and audit requirements.
-
Environment isolation prevents accidental cross-environment data access, which is particularly dangerous for agents because their non-deterministic behavior makes such accidents harder to detect.
-
Runaway prevention is a first-class concern, because agents can consume resources and take actions at a rate that exceeds what any human user would do.
-
The audit trail captures three layers: who changed the agent's capabilities, what the agent accessed, and why the agent made the decisions it made. All three are necessary for compliance in regulated environments.
What Comes Next
We have now covered five layers of the production agent stack: hosting and the problem space, runtime and deployments, Gateway and tool integration, memory and context management, and identity and policy enforcement. Together, these handle the compute, the lifecycle, the connectivity, the state, and the governance.
But we have not answered the hardest question. How do we know our agents are behaving correctly?
Not "are they running without errors?" That is basic operational health. Not "are they responding within latency targets?" That is performance monitoring. The hard question is: are the agent's outputs correct? Are they clinically accurate? Are they financially sound? Are they compliant with regulations? Are they helpful to the user? Are they getting better over time, or are they quietly degrading?
Observability and evaluation for agents is a fundamentally different problem than for traditional services, because the definition of "correct" is fuzzy, domain-specific, and evolving. And yet, without a rigorous answer to that question, everything else we have built is running on faith.
Now that we have covered hosting, runtime, integrations, memory, and identity, we still have not answered the hardest question: How do we know our agents are behaving correctly?
This is Part 5 of a series on production-scale agent hosting. Part 6 will cover observability, evaluation pipelines, and the question of how you define and measure "correct" for non-deterministic systems.