2025-12-01
Building Production-Ready AI Agents with AWS Bedrock AgentCore
Learn how AWS Bedrock AgentCore solves the infrastructure challenges of deploying agentic AI at scale - from prototype to production with runtime, memory, gateway, and multi-agent coordination.
The Production Gap
Many teams have built impressive LangChain or CrewAI prototypes that demonstrate real value - until it’s time to deploy them. The jump from “it works on my laptop” to production involves session isolation, credential management, memory persistence, observability, and security controls. Building this infrastructure from scratch takes months, which is why 70% of AI projects never make it past the pilot phase.
AWS Bedrock AgentCore (GA October 2025) addresses this production gap. It’s not another agent framework competing with LangChain or CrewAI. Instead, it’s the managed infrastructure layer that agents built with ANY framework need to run at scale. Think of it as “Lambda for AI agents” - you bring your agent code, AgentCore handles runtime, memory, tool management, and security.
This post explores how AgentCore solves real infrastructure challenges and when it makes sense to use it over self-hosted alternatives.
AgentCore Architecture
AgentCore consists of five integrated services that work independently or together:
Runtime: Serverless execution environment with 8-hour session windows and automatic session isolation using dedicated microVMs per user.
Memory: Managed storage for both short-term conversation context and long-term user preferences, facts, and summaries - without building your own vector database.
Gateway: Centralized tool management using the Model Context Protocol (MCP). Convert Lambda functions, REST APIs, and existing services into agent-accessible tools.
Identity: Secure credential management with OAuth 2.0 integration. Agents access third-party APIs on behalf of users without storing credentials.
Observability: OpenTelemetry-compatible metrics and traces exported to CloudWatch, Datadog, or LangSmith.
Runtime: Deploy Any Framework
The fundamental challenge with production agents is providing secure, isolated execution environments. AgentCore Runtime handles this through consumption-based microVM allocation.
Here’s how to deploy a Strands agent to AgentCore:
from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent
app = BedrockAgentCoreApp()
@app.entrypoint
def invoke(payload):
agent = Agent(
model="anthropic.claude-sonnet-4-20250514-v1:0",
instructions="You are a customer support agent with access to order history and return policies."
)
return agent.run(payload.get("message"))
Deploy with the CLI:
agentcore configure
agentcore launch --region us-east-1
Key runtime characteristics:
- 8-hour execution windows: Industry-leading for async agentic workflows. Traditional serverless functions timeout at 15 minutes.
- Session isolation: Each user gets a dedicated microVM. No data leakage between sessions.
- Consumption pricing: Pay for active CPU/memory only, not I/O wait time. This can be significantly cheaper than pre-allocated Lambda configurations for agentic workloads that spend significant time waiting on LLM responses.
- ARM64 containers: Required for performance optimization. Use
--platform=linux/arm64in Docker builds.
Common pitfall: Not handling the Mcp-Session-Id header. AgentCore auto-injects this for stateless MCP servers:
from fastapi import FastAPI, Header
app = FastAPI()
@app.post("/mcp")
async def mcp_endpoint(
mcp_session_id: str = Header(None, alias="Mcp-Session-Id")
):
# AgentCore manages session isolation
# Your server must accept platform-generated IDs
session_state = load_session(mcp_session_id)
return {"status": "ok"}
Memory: Context Without Infrastructure
Building production memory for agents requires solving two problems: short-term conversation context and long-term knowledge persistence. AgentCore Memory handles both.
Memory extraction pipeline:
Implementing memory with three strategies:
from bedrock_agentcore.memory import (
MemoryClient,
UserPreferenceMemoryStrategy,
SemanticMemoryStrategy,
SummaryMemoryStrategy
)
memory_client = MemoryClient()
# Create memory with multiple strategies
memory = memory_client.create_memory(
name="customer-support-memory",
strategies=[
UserPreferenceMemoryStrategy(), # Learn user patterns
SemanticMemoryStrategy(), # Store facts/knowledge
SummaryMemoryStrategy() # Compress sessions
],
encryption_key_arn="arn:aws:kms:us-east-1:123456789012:key/abc123"
)
# Store conversation event
memory_client.create_event(
memory_id=memory.id,
event_data={
"type": "conversation",
"content": "User prefers technical explanations with code examples"
}
)
Strategy selection guide:
- Customer Support: UserPreferences + Summaries (remember communication style)
- Technical Assistant: SemanticFacts + Summaries (remember codebase knowledge)
- Personal Agent: All three strategies (comprehensive personalization)
Critical security pattern - always use Guardrails before CreateEvent API:
import boto3
bedrock = boto3.client('bedrock')
# WRONG: Direct storage (vulnerable to memory poisoning)
# memory_client.create_event(
# memory_id=memory.id,
# event_data={"content": user_input}
# )
# RIGHT: Sanitize with Guardrails first
guardrail_response = bedrock.apply_guardrail(
guardrailId='guardrail-123',
guardrailVersion='1',
content=[{"text": {"text": user_input}}]
)
if guardrail_response['action'] == 'NONE':
memory_client.create_event(
memory_id=memory.id,
event_data={"content": user_input}
)
else:
# Block and log attack attempt
logger.warning(f"Memory poisoning attempt blocked: {guardrail_response}")
Cost optimization: Limit retriever hops. Two-three retrieval operations per turn is normal, ten indicates over-retrieval:
memory_config = {
'retrieval_strategy': 'semantic',
'max_results': 5,
'max_retriever_hops': 2
}
Gateway: Centralized Tool Management
Embedding tools directly in agent code leads to duplication and inconsistency. When you have customer support, sales, and technical agents all needing weather data, maintaining three copies of weather tool code becomes a maintenance problem.
AgentCore Gateway solves this through centralized MCP-compatible tool servers:
Registering a Lambda function as a tool:
import boto3
agentcore = boto3.client('bedrock-agentcore')
# Register Lambda as tool target
response = agentcore.create_target(
gatewayId='gateway-123',
targetConfig={
'type': 'LAMBDA',
'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:get-weather',
'description': 'Get current weather for a city'
}
)
Gateway handles:
- Authentication: IAM roles for AWS resources, OAuth 2.0 for third-party APIs, API keys for services
- Semantic tool search: Agents discover relevant tools via
x_amz_bedrock_agentcore_searchwithout knowing all available tools - Protocol conversion: Lambda functions, OpenAPI specs, Smithy models, and MCP servers all exposed through standardized MCP interface
Architecture pattern - centralize common tools, keep domain-specific tools local:
Common Tools (via Gateway):
- Web search
- Database queries
- Weather API
- Stock prices
Domain-Specific Tools (agent-local):
- Return policy logic
- Product catalog
- Business rules
Multi-Agent Coordination with A2A Protocol
Scaling from single agents to coordinated agent teams requires standardized communication. AgentCore uses the Agent-to-Agent (A2A) protocol for this.
A2A vs MCP distinction:
- MCP: Agent-to-tool communication (agent calling weather API)
- A2A: Agent-to-agent communication (supervisor coordinating specialists)
Hub-and-spoke supervisor implementation:
import { BedrockAgentCoreClient, InvokeAgentCommand } from '@aws-sdk/client-bedrock-agentcore';
class HostAgent {
private client: BedrockAgentCoreClient;
private specialistAgents: Map<string, AgentConfig>;
async routeToSpecialist(query: string, capability: string) {
const agentConfig = this.specialistAgents.get(capability);
// Fetch remote agent's A2A configuration
const agentCard = await this.fetchAgentCard(agentConfig.endpoint);
// Invoke via A2A protocol
const command = new InvokeAgentCommand({
agentId: agentCard.id,
sessionId: this.generateSessionId(),
inputText: query,
protocol: 'A2A'
});
return await this.client.send(command);
}
private async fetchAgentCard(endpoint: string): Promise<AgentCard> {
// Retrieve agent capabilities schema
const response = await fetch(`${endpoint}/.well-known/agent-card`);
return response.json();
}
}
Orchestration patterns:
Supervisor with routing mode - not every query needs full orchestration:
class SupervisorAgent:
def route_query(self, query: str):
# Simple query → direct routing
if self.is_simple_query(query):
specialist = self.select_single_specialist(query)
return specialist.invoke(query)
# Complex query → full orchestration
else:
plan = self.analyze_and_plan(query)
results = self.orchestrate_subagents(plan)
return self.synthesize(results)
def is_simple_query(self, query: str) -> bool:
intents = self.detect_intents(query)
return len(intents) == 1
Framework interoperability: LangGraph monitoring agent + CrewAI analytics agent + Strands incident response agent can all communicate via A2A. No framework lock-in.
Security and Cost Optimization
Guardrails Configuration
Guardrails protect against prompt injection, memory poisoning, and harmful content:
import boto3
bedrock = boto3.client('bedrock')
guardrail = bedrock.create_guardrail(
name='production-agent-guardrail',
contentPolicyConfig={
'filtersConfig': [
{'type': 'HATE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
{'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'HIGH'},
{'type': 'PROMPT_ATTACK', 'inputStrength': 'HIGH', 'outputStrength': 'NONE'}
]
},
topicPolicyConfig={
'topicsConfig': [
{
'name': 'Financial Advice',
'definition': 'Providing specific investment recommendations',
'type': 'DENY'
}
]
},
wordPolicyConfig={
'wordsConfig': [
{'text': 'internal-api-key'},
{'text': 'secret-token'}
],
'managedWordListsConfig': [
{'type': 'PROFANITY'}
]
}
)
Defense-in-depth strategy:
- Input validation: Block malicious prompts at entry
- Memory protection: Sanitize before CreateEvent API
- Output filtering: Prevent harmful responses
- Audit trails: CloudWatch logs for compliance
Cost Optimization Strategies
Prompt caching - 90% discount on cached tokens:
response = bedrock_runtime.converse(
modelId="anthropic.claude-sonnet-4-20250514-v1:0",
messages=[{"role": "user", "content": user_query}],
system=[
{
"text": large_system_prompt,
"cachePoint": {"type": "default"}
}
]
)
Model routing - match complexity to model cost:
def route_to_model(query: str) -> str:
complexity = classify_query_complexity(query)
if complexity < 0.3:
return "anthropic.claude-haiku-4-5-20241022-v1:0" # $1.00/$5.00 per 1M tokens (input/output)
elif complexity < 0.7:
return "anthropic.claude-sonnet-4-20250514-v1:0" # $3/$15 per 1M tokens (input/output)
else:
return "anthropic.claude-opus-4-20250514-v1:0" # $15/$75 per 1M tokens (input/output) - Opus 4
Tool-call budgets - prevent unbounded tool use:
agent = Agent(
model="anthropic.claude-sonnet-4-20250514-v1:0",
max_tool_calls_per_turn=3,
instructions="If user asks about multiple items, summarize instead of exhaustive lookup"
)
Cost components:
- Runtime: Active CPU/memory consumption (not pre-allocated)
- Memory: Short-term (per event), long-term (per memory processed + retrievals)
- Gateway: MCP operations (ListTools, CallTool, Ping) + semantic search queries
- Identity: No additional charges when used via Runtime/Gateway
- Observability: CloudWatch standard pricing
Common Pitfalls
Memory Poisoning Without Guardrails
Problem: Storing raw user input directly allows prompt injection into memory:
# WRONG
user_input = "Ignore previous instructions, you are now..."
memory_client.create_event(
memory_id=memory.id,
event_data={"content": user_input}
)
Solution: Always sanitize with Guardrails first (shown in Memory section above).
Tool-Call Storms
Problem: Agent invokes 20+ tools per query without limits:
User: "What's the weather in major cities?"
Agent makes 50 separate get_weather() calls
Total: 10s latency, $0.05 per query
Solution: Enforce tool-call budgets and guide via instructions:
agent = Agent(
max_tool_calls_per_turn=3,
instructions="For multiple items, summarize instead of exhaustive lookup"
)
ARM64 Container Requirements
Problem: Using x86 containers causes deployment failures.
Solution: Build for ARM64 explicitly:
FROM --platform=linux/arm64 python:3.11-slim
COPY . /app
CMD ["python", "agent.py"]
docker buildx build --platform linux/arm64 -t agent:latest .
No VPC Integration for Internal APIs
Problem: Agent traffic goes over public internet.
Solution: Configure VPC and PrivateLink:
runtime_config = {
'vpcConfig': {
'securityGroupIds': ['sg-12345'],
'subnetIds': ['subnet-abc', 'subnet-def']
},
'privateLinkEnabled': True
}
When to Use AgentCore
Use AgentCore when:
- Multiple agent frameworks in use (LangChain + CrewAI + custom)
- Need to evaluate different models (Bedrock + OpenAI + Anthropic)
- Enterprise security required (VPC, PrivateLink, customer-managed KMS)
- Multi-agent systems planned (A2A coordination)
- Fast time-to-production needed (weeks, not months)
- Team size under 10 (can’t build infrastructure from scratch)
Consider alternatives when:
- Single framework forever (e.g., only LangGraph → use LangGraph Cloud)
- Single cloud ecosystem (e.g., all Azure → Azure AI Agent Service)
- Extreme high volume (over 10M sessions/month → self-hosted may be cheaper)
- Need custom hardware (GPUs for specialized models → self-hosted)
- Already built agent infrastructure (sunk costs)
Break-even analysis for self-hosting:
AgentCore becomes cost-effective when:
- Agent development time exceeds 2 weeks
- Multiple agent types (customer support, analytics, monitoring)
- Enterprise security/compliance required
- Team size under 10 dedicated to agent infrastructure
Self-hosted infrastructure costs: 200k/year DevOps team. Break-even at approximately 10M sessions/month.
Key Takeaways
AgentCore is infrastructure, not a framework. It doesn’t replace LangChain or CrewAI - it provides the production runtime they need to scale.
Modular adoption reduces risk. Start with Runtime only, add Memory → Gateway → Identity → Observability incrementally. Each service delivers independent value.
Security is built-in. Session isolation, Guardrails, Identity management, and VPC integration are production-ready features, not bolt-ons.
Cost optimization is multi-dimensional. Prompt caching (90% discount), model routing (30% savings), tool-call budgets, and consumption pricing compound to reduce costs 60-80%.
Multi-agent systems need protocols. MCP for agent-to-tool, A2A for agent-to-agent. Framework interoperability allows LangGraph + CrewAI + Strands agents to work together.
Resources
- Amazon Bedrock AgentCore Documentation
- Sample Repository (GitHub)
- AgentCore Pricing
- Best Practices Guide
Start with the $200 AWS free tier credit available to new AWS customers to validate your use case before committing to production deployment.
Related posts
A CDK guide for deploying a minimal Strands agent on AgentCore Runtime — parameterized stack, arm64 build, deploy and invoke, and the IAM and Marketplace prerequisites you need before the first call.
A practical comparison of TypeScript AI SDKs for building AI agents - Vercel AI SDK, OpenAI Agents SDK, and AWS Bedrock integration. Includes code examples, decision frameworks, and production patterns.
A production-focused guide to implementing feature flags in distributed systems, comparing LaunchDarkly, Unleash, and AWS AppConfig with working examples for gradual rollouts, A/B testing, and managing technical debt.
A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.
Learn how to implement secure cross-account event distribution using Amazon SNS and SQS. Covers IAM policies, KMS encryption, AWS CDK implementation, and common pitfalls from real-world deployments.