AgentCore Gateway: How Agents Talk to the Outside World Safely
AgentCore Gateway: How Agents Talk to the Outside World Safely
Part 3 of a series on building production-scale agent platforms
In Part 1, we covered why hosting agents is fundamentally different from hosting stateless services and compared the major cloud platforms. In Part 2, we went deep into AgentCore's runtime model, covering how agents get packaged, versioned, deployed, and rolled back. We traced the full lifecycle using Lumen Health's progress report agent as our running example.
Both of those articles treated the agent as a mostly self-contained system. We talked about how it runs and how it gets deployed, but we glossed over something critical: how it interacts with everything outside itself.
An agent that cannot reach external systems is just an expensive chatbot. The whole point of agents is that they take actions. They query databases, call APIs, trigger workflows, send notifications, and write data. For Lumen Health's progress report agent, that means pulling session data from the practice management system, querying assessment scores, looking up treatment plans, and generating documents that end up in clinical records. Every one of those interactions crosses a boundary. And every boundary crossing is a place where things can go wrong in ways that are hard to predict and expensive to fix.
This article is about the mediation layer that sits between the agent and everything it touches. In AgentCore, that layer is called Gateway.
Why Agents Need a Gateway Layer
Let me start with why direct tool access is dangerous, because this is where most prototype agents cut corners.
In a typical prototype, a tool is a Python function. The agent calls the function, the function hits an API, the API returns data. Simple. The API key is probably hardcoded in the function or pulled from an environment variable. There is no input validation beyond what the external API enforces. There is no rate limiting. There is no audit trail beyond whatever logging the developer remembered to add. There is no circuit breaker for when the downstream service is degraded.
Now scale that to production. The agent is handling requests from hundreds of clinicians across dozens of organizations. Each organization has its own data isolation requirements. The agent is calling five or six different services per execution. Some of those services are internal, some are third-party. Some contain PHI, some do not. Some have strict rate limits, some are effectively unlimited. Some are critical path (the report cannot be generated without session data), some are best-effort (pulling a client photo for the report header is nice but not essential).
Without a mediation layer, all of that complexity lives in the tool implementations. Your tool functions become bloated with authentication logic, retry handling, error classification, rate limiting, and audit logging. Worse, the agent itself can manipulate tool inputs in unexpected ways. If a prompt injection attack convinces the agent to pass a different client ID to the session data API, and the tool function just forwards whatever the agent gives it, you have a data breach.
Gateway exists to solve this. It is the policy enforcement point between the agent and the outside world. Every outbound request from the agent passes through Gateway, where it gets authenticated, authorized, validated, rate-limited, logged, and potentially transformed before reaching the target service. Every response passes back through Gateway, where it gets sanitized, error-classified, and recorded before the agent sees it.
The agent does not hold credentials. The agent does not know the actual endpoint URLs. The agent describes what it wants to do ("get session data for client X"), and Gateway decides whether that request is allowed, routes it to the right service, handles the authentication, and returns the result.
The Integration Model
AgentCore Gateway supports three categories of integration targets, each suited to different interaction patterns.
API Targets
The most common pattern. The agent needs to call a REST or GraphQL API, either internal or third-party. Gateway acts as a reverse proxy with policy enforcement.
For each API target, you configure:
- Endpoint mapping. The actual URL and path structure of the target service. The agent never sees this. It references a logical tool name, and Gateway resolves it to the physical endpoint.
- Authentication. Gateway injects credentials on behalf of the agent. These can be IAM-based (for AWS services), OAuth tokens (for third-party APIs), API keys (stored in Secrets Manager and rotated automatically), or mutual TLS certificates. The agent code never touches credentials.
- Request validation. A schema that defines what the agent is allowed to send. If the agent tries to pass parameters outside the schema (whether through a bug, a hallucination, or a prompt injection attack), Gateway rejects the request before it reaches the target.
- Response filtering. Rules that strip sensitive fields from the response before the agent sees them. If the session data API returns billing codes alongside clinical data, but the agent only needs the clinical data, Gateway strips the billing fields. This limits the blast radius if the agent's conversation history is ever exposed.
- Rate limiting. Per-organization, per-user, and per-agent-version rate limits. This protects downstream services from agent loops (where the agent repeatedly calls the same tool due to a reasoning error) and from load spikes during bulk report generation.
For Lumen Health, the API targets include:
- Session Data API. Internal service returning trial-by-trial data, behavior frequency counts, and session notes. Scoped by organization and client. Gateway enforces that the requesting BCBA's organization matches the client's organization.
- Assessment Service. Returns VB-MAPP, ABLLS-R, and AFLS scores. Gateway validates that the requested assessment belongs to the specified client and that the requesting user has the clinical role required to access assessment data.
- Treatment Plan Service. Returns active treatment plans with program lists and mastery criteria. Gateway strips draft plan versions that have not been finalized, ensuring the agent only works with approved clinical protocols.
Event-Driven Triggers
Not every agent interaction is request-response. Sometimes the agent needs to fire an event and move on: notify a supervisor, queue a document for review, trigger a downstream workflow. Gateway supports publishing to event buses (EventBridge, SQS, SNS) with the same policy controls as API targets.
This matters for Lumen because the progress report agent does not just generate reports. It also:
- Publishes a "report draft ready" event that triggers a notification to the BCBA's dashboard.
- Queues a quality review task if the agent flagged anything as clinically uncertain.
- Sends a "data anomaly detected" event if session data patterns suggest a data entry error (like a session recorded at 3 AM on a holiday).
Each of these is a fire-and-forget action from the agent's perspective. Gateway handles delivery guarantees, dead-letter routing for failed deliveries, and deduplication for events that might be emitted more than once if the agent retries a step.
The event-driven pattern also works in reverse. Agents can be triggered by events, not just by user requests. An EventBridge rule can invoke an AgentCore agent when a new assessment is completed, automatically generating a comparison summary against the prior assessment period. Gateway mediates this inbound path too, validating the event payload and ensuring the agent receives only the data it is authorized to process.
Inbound Communication and Webhooks
Some integrations require the external service to call back into the agent. A document signing service that notifies the agent when a parent has signed a progress report. A scheduling system that alerts the agent when a BCBA's availability changes. A third-party data provider that pushes updated payer guidelines.
Gateway provides managed webhook endpoints for these inbound flows. Each webhook has its own authentication requirements (signature verification, mutual TLS, or API key), payload validation, and routing rules that determine which agent version and which conversation context should receive the callback.
This is particularly important for long-running agent workflows. Lumen's report generation process is not always synchronous. Sometimes the BCBA reviews a draft, asks the agent to adjust a section, leaves for a session, comes back two hours later, and continues the conversation. During that time, an assessment might be completed, or a payer might update their requirements. Inbound webhooks let the agent's context be updated with relevant changes even when the conversation is paused.
The Architecture in Words
Let me describe the full picture for Lumen Health's setup, since a text-based architecture description is more useful than a diagram you cannot zoom into.
At the top is the BCBA, interacting through Lumen's web application. The web app sends requests to AgentCore's API endpoint, which routes through the deployment configuration to the appropriate agent version.
The agent runs inside AgentCore's managed runtime. It has access to a Bedrock model endpoint (for reasoning and generation) and to its conversation state store (managed by AgentCore). It does not have direct network access to any other service.
Between the agent and every external service sits Gateway. The agent makes tool calls by referencing logical tool names. Gateway resolves each tool call against its configuration, applies all policy checks, and routes the request to the target.
Behind Gateway, the targets fan out:
- Left branch: Lumen's internal services (session data, assessments, treatment plans), running in their own VPC. Gateway accesses these through VPC endpoints, using IAM-scoped roles that limit access to the specific API paths the agent needs. No broad "read all data" permissions.
- Center branch: AWS services (S3 for document storage, EventBridge for notifications, SQS for review queues). Gateway uses native AWS IAM authentication with least-privilege policies.
- Right branch: Third-party services (payer documentation APIs, clinical guideline databases). Gateway uses Secrets Manager for API key rotation and enforces outbound rate limits that comply with each provider's usage terms.
Every request and response flowing through Gateway produces a structured log entry that includes: the agent version, the user identity, the organization, the tool name, the request parameters (with sensitive fields redacted), the response status, the latency, and a correlation ID that links back to the full execution trace in X-Ray.
For the HIPAA compliance team, this architecture means that every data access by the agent is mediated, authenticated, authorized, and audited. There is no path for the agent to reach clinical data without going through Gateway. And Gateway's configuration is declarative, reviewable, and version-controlled, which means the security team can audit it using the same review processes they use for IAM policies.
The Hard Parts: Latency, Errors, Idempotency, and Rate Limits
Gateway is not free. Every mediation layer adds latency and complexity. Let me walk through the operational tradeoffs.
Latency
Every tool call passes through Gateway, which adds network hops, policy evaluation time, and serialization overhead. For a single tool call, this is typically a few milliseconds. But agents make multiple tool calls per execution, sometimes sequentially (when the result of one tool call determines the next), sometimes in parallel (when gathering data from independent sources).
For Lumen's progress report agent, a typical execution involves six to eight tool calls. If Gateway adds 5ms per call, that is 30-40ms of additional latency across the execution. Negligible compared to the model inference time (which dominates at several seconds). But if your agent is in a tight loop making dozens of tool calls (say, iterating through each program in a treatment plan one at a time rather than batching), that overhead compounds.
The mitigation is twofold. First, design your tools to support batch operations. Instead of "get session data for program X" called 20 times, expose "get session data for programs [X, Y, Z, ...]" called once. Second, Gateway supports connection pooling and keep-alive to downstream targets, amortizing connection setup costs across calls.
Error Handling
When a tool call fails, the agent needs to know why so it can decide what to do. But "why" in a mediated system has multiple layers. Did Gateway reject the request because of a policy violation? Did the downstream service return a 500? Did the request time out? Was the rate limit exceeded?
Gateway classifies errors into categories that the agent can reason about:
- Policy rejection. The request was not allowed. The agent should not retry. This typically means the agent is trying to do something it should not do, either due to a reasoning error or a prompt injection attempt.
- Transient failure. The downstream service returned a retryable error or timed out. Gateway can auto-retry with exponential backoff (configurable per target), or surface the error to the agent for it to decide.
- Rate limit. The agent or organization has exceeded its allocation. Gateway returns a "retry after" signal. The agent can wait and retry, or proceed with partial data.
- Permanent failure. The downstream service returned a non-retryable error (404, 422). The agent needs to handle this gracefully, perhaps by informing the user or adjusting its approach.
For Lumen's agent, the error handling strategy differs by tool. A failure to retrieve session data is critical and should stop report generation with a clear error message to the BCBA. A failure to retrieve a client photo is cosmetic and should be silently skipped. A rate limit on the payer documentation API should cause the agent to proceed with cached guidelines and flag the report for manual review of that section.
Idempotency
Agents retry. Models sometimes produce the same tool call twice in a reasoning loop. Network issues cause requests to be re-sent. If the tool call has side effects (writing data, sending notifications, triggering workflows), you need idempotency to prevent duplicate actions.
Gateway supports idempotency keys for write operations. Each tool call includes a unique identifier derived from the execution trace (the combination of the agent execution ID, the step number, and the tool name). If Gateway sees a duplicate request with the same idempotency key, it returns the cached response from the first call rather than executing again.
This is critical for Lumen's event-driven triggers. If the agent publishes a "report draft ready" notification, and then the model retries the step due to a transient error, the BCBA should get one notification, not two. Gateway's idempotency layer ensures that.
Rate Limiting
Rate limiting for agents is more nuanced than for traditional services. With a REST API, you rate limit by API key or by client IP. With agents, you need multiple dimensions:
- Per-organization limits. Prevent one large organization's bulk report generation from starving smaller organizations.
- Per-user limits. Prevent a single BCBA from monopolizing agent capacity.
- Per-tool limits. Respect downstream service capacity constraints.
- Per-agent-version limits. During canary deployments, limit the new version's resource consumption to contain the blast radius if it has a reasoning bug that causes excessive tool calls.
Gateway evaluates rate limits before forwarding requests, using a token bucket algorithm with per-dimension buckets. When a limit is reached, the agent receives a structured error that includes which limit was hit and when it resets. This lets the orchestration logic make informed decisions: wait and retry, proceed with cached data, or gracefully degrade.
How Google and Azure Handle This Differently
The mediation problem is not unique to AWS. Every agent platform needs something like Gateway. But the architectural approaches differ in ways that reflect each platform's broader philosophy.
Google Vertex AI handles tool integration through a tighter coupling between the agent framework and Google's own services. Tool definitions in Vertex AI Agent Builder include built-in connectors for Google Cloud services (BigQuery, Cloud SQL, Cloud Storage, Pub/Sub) and a set of pre-built enterprise connectors (Salesforce, ServiceNow, Jira). For these supported targets, the mediation layer is largely invisible. You declare the connection, provide credentials via Secret Manager, and the platform handles routing, authentication, and basic error handling.
The advantage is speed of integration for supported targets. If your tools map to the pre-built connectors, you are up and running quickly. The disadvantage is flexibility. Custom API targets require more manual configuration, and the policy enforcement capabilities (input validation, response filtering, per-dimension rate limiting) are less granular than what Gateway offers. For Lumen's use case, where most tools are custom internal APIs, the pre-built connectors do not help much.
Google's grounding feature is worth noting here. For retrieval-augmented generation, Vertex AI provides a native integration between the agent and Vertex AI Search that bypasses the general tool integration path. This is optimized for the common pattern of "search documents and use results in the response." For agents whose primary tool is information retrieval (clinical guideline lookup, regulatory reference checks), this tight integration offers lower latency and better result quality than routing through a generic tool layer.
Microsoft Azure AI Agent Service leans heavily on the Microsoft Graph as its mediation layer for enterprise data. If your tools involve accessing Office 365 data (emails, calendars, SharePoint documents, Teams messages), the Microsoft Graph integration is unmatched. The agent inherits the user's existing Microsoft Entra ID permissions, which means the authorization model is the one your organization already manages.
For non-Microsoft integrations, Azure provides API Management as the mediation layer. This is a mature product with strong policy capabilities (input/output transformation, rate limiting, caching, authentication), but it was built for traditional API consumers, not for agents. The agent-specific concerns (idempotency for retried tool calls, response sanitization to prevent data leakage through the conversation, per-agent-version rate limiting) require additional custom logic on top of API Management.
Azure's Logic Apps and Power Automate integrations offer event-driven capabilities similar to AgentCore's EventBridge integration, but with a low-code orientation. This works well for organizations that have existing Logic Apps workflows and want agents to participate in them. It works less well for engineering teams that prefer infrastructure-as-code and programmatic control.
The short version: Google optimizes for agents that primarily search and synthesize information from Google Cloud data stores. Microsoft optimizes for agents that operate within the Microsoft productivity ecosystem. AWS optimizes for agents that need fine-grained control over how they interact with arbitrary services in complex, multi-service architectures. None is universally better. The right choice depends on where your tools live and how much control you need over the mediation layer.
The Design Principle Behind All of This
If I step back from the implementation details, the core principle is simple: agents should not be trusted with direct access to anything consequential.
That sounds harsh. But think about it this way. An agent's behavior is non-deterministic. It is influenced by user input, which you cannot fully control. It is influenced by model reasoning, which you cannot fully predict. It is influenced by the conversation history, which accumulates over multiple turns in ways that can shift the agent's behavior in subtle ways.
In that context, giving an agent direct access to a database connection string, or a service account with broad permissions, or an unmediated path to a payment API, is analogous to giving a new employee admin access on their first day. Maybe they will be fine. But you do not structure your security posture around "maybe."
Gateway implements the principle of least privilege for non-deterministic systems. The agent can only reach what you have explicitly configured it to reach. It can only send requests that match the schemas you have defined. It can only access data that the requesting user is authorized to see. And every interaction is logged in a way that your security and compliance teams can audit.
For Lumen Health, operating in a HIPAA-regulated environment with PHI flowing through every agent execution, this is not optional. It is the foundation of their compliance story. When an auditor asks "how do you ensure the AI cannot access patient data it should not see?" the answer is not "we wrote careful prompts." The answer is "every data access is mediated by Gateway, which enforces organization-scoped, role-based policies on every request, and every access is logged to CloudTrail." That is an answer an auditor can work with.
What Comes Next
We have now covered three layers of the AgentCore stack: the runtime that executes agents, the deployment machinery that manages versions and rollouts, and the gateway that mediates every interaction with the outside world. Together, these handle the compute, the lifecycle, and the connectivity.
But there is a fourth dimension we have not addressed. Agents are not stateless. They remember things. They build up knowledge over the course of a conversation, and in many cases, across conversations. A BCBA who asks the agent about a client's manding progress today should not have to re-explain the client's history tomorrow. The agent should remember. But memory in a multi-tenant, HIPAA-regulated environment raises its own set of hard questions: what gets remembered, who can access it, how long does it persist, and what happens when a client's data needs to be purged?
If Gateway controls what agents can touch, the next article explores what they remember. AgentCore Memories.
This is Part 3 of a series on production-scale agent hosting. Part 1 covered the problem space and platform comparison. Part 2 covered the AgentCore runtime and deployment model. Part 4 will cover AgentCore's memory and state management.