2025-12-01

Building Production-Ready AI Agents with AWS Bedrock AgentCore

Learn how AWS Bedrock AgentCore solves the infrastructure challenges of deploying agentic AI at scale - from prototype to production with runtime, memory, gateway, and multi-agent coordination.

The Production Gap

Many teams have built impressive LangChain or CrewAI prototypes that demonstrate real value - until it’s time to deploy them. The jump from “it works on my laptop” to production involves session isolation, credential management, memory persistence, observability, and security controls. Building this infrastructure from scratch takes months, which is why 70% of AI projects never make it past the pilot phase.

AWS Bedrock AgentCore (GA October 2025) addresses this production gap. It’s not another agent framework competing with LangChain or CrewAI. Instead, it’s the managed infrastructure layer that agents built with ANY framework need to run at scale. Think of it as “Lambda for AI agents” - you bring your agent code, AgentCore handles runtime, memory, tool management, and security.

This post explores how AgentCore solves real infrastructure challenges and when it makes sense to use it over self-hosted alternatives.

AgentCore Architecture

AgentCore consists of five integrated services that work independently or together:

Runtime: Serverless execution environment with 8-hour session windows and automatic session isolation using dedicated microVMs per user.

Memory: Managed storage for both short-term conversation context and long-term user preferences, facts, and summaries - without building your own vector database.

Gateway: Centralized tool management using the Model Context Protocol (MCP). Convert Lambda functions, REST APIs, and existing services into agent-accessible tools.

Identity: Secure credential management with OAuth 2.0 integration. Agents access third-party APIs on behalf of users without storing credentials.

Observability: OpenTelemetry-compatible metrics and traces exported to CloudWatch, Datadog, or LangSmith.

Runtime: Deploy Any Framework

The fundamental challenge with production agents is providing secure, isolated execution environments. AgentCore Runtime handles this through consumption-based microVM allocation.

Here’s how to deploy a Strands agent to AgentCore:

from bedrock_agentcore import BedrockAgentCoreApp
from strands import Agent

app = BedrockAgentCoreApp()

@app.entrypoint
def invoke(payload):
    agent = Agent(
        model="anthropic.claude-sonnet-4-20250514-v1:0",
        instructions="You are a customer support agent with access to order history and return policies."
    )
    return agent.run(payload.get("message"))

Deploy with the CLI:

agentcore configure
agentcore launch --region us-east-1

Key runtime characteristics:

8-hour execution windows: Industry-leading for async agentic workflows. Traditional serverless functions timeout at 15 minutes.
Session isolation: Each user gets a dedicated microVM. No data leakage between sessions.
Consumption pricing: Pay for active CPU/memory only, not I/O wait time. This can be significantly cheaper than pre-allocated Lambda configurations for agentic workloads that spend significant time waiting on LLM responses.
ARM64 containers: Required for performance optimization. Use --platform=linux/arm64 in Docker builds.

Common pitfall: Not handling the Mcp-Session-Id header. AgentCore auto-injects this for stateless MCP servers:

from fastapi import FastAPI, Header

app = FastAPI()

@app.post("/mcp")
async def mcp_endpoint(
    mcp_session_id: str = Header(None, alias="Mcp-Session-Id")
):
    # AgentCore manages session isolation
    # Your server must accept platform-generated IDs
    session_state = load_session(mcp_session_id)
    return {"status": "ok"}

Memory: Context Without Infrastructure

Building production memory for agents requires solving two problems: short-term conversation context and long-term knowledge persistence. AgentCore Memory handles both.

Memory extraction pipeline:

Implementing memory with three strategies:

from bedrock_agentcore.memory import (
    MemoryClient,
    UserPreferenceMemoryStrategy,
    SemanticMemoryStrategy,
    SummaryMemoryStrategy
)

memory_client = MemoryClient()

# Create memory with multiple strategies
memory = memory_client.create_memory(
    name="customer-support-memory",
    strategies=[
        UserPreferenceMemoryStrategy(),  # Learn user patterns
        SemanticMemoryStrategy(),  # Store facts/knowledge
        SummaryMemoryStrategy()  # Compress sessions
    ],
    encryption_key_arn="arn:aws:kms:us-east-1:123456789012:key/abc123"
)

# Store conversation event
memory_client.create_event(
    memory_id=memory.id,
    event_data={
        "type": "conversation",
        "content": "User prefers technical explanations with code examples"
    }
)

Strategy selection guide:

Customer Support: UserPreferences + Summaries (remember communication style)
Technical Assistant: SemanticFacts + Summaries (remember codebase knowledge)
Personal Agent: All three strategies (comprehensive personalization)

Critical security pattern - always use Guardrails before CreateEvent API:

import boto3

bedrock = boto3.client('bedrock')

# WRONG: Direct storage (vulnerable to memory poisoning)
# memory_client.create_event(
#  memory_id=memory.id,
#  event_data={"content": user_input}
# )

# RIGHT: Sanitize with Guardrails first
guardrail_response = bedrock.apply_guardrail(
    guardrailId='guardrail-123',
    guardrailVersion='1',
    content=[{"text": {"text": user_input}}]
)

if guardrail_response['action'] == 'NONE':
    memory_client.create_event(
        memory_id=memory.id,
        event_data={"content": user_input}
    )
else:
    # Block and log attack attempt
    logger.warning(f"Memory poisoning attempt blocked: {guardrail_response}")

Cost optimization: Limit retriever hops. Two-three retrieval operations per turn is normal, ten indicates over-retrieval:

memory_config = {
    'retrieval_strategy': 'semantic',
    'max_results': 5,
    'max_retriever_hops': 2
}

Gateway: Centralized Tool Management

Embedding tools directly in agent code leads to duplication and inconsistency. When you have customer support, sales, and technical agents all needing weather data, maintaining three copies of weather tool code becomes a maintenance problem.

AgentCore Gateway solves this through centralized MCP-compatible tool servers:

Registering a Lambda function as a tool:

import boto3

agentcore = boto3.client('bedrock-agentcore')

# Register Lambda as tool target
response = agentcore.create_target(
    gatewayId='gateway-123',
    targetConfig={
        'type': 'LAMBDA',
        'lambdaArn': 'arn:aws:lambda:us-east-1:123456789012:function:get-weather',
        'description': 'Get current weather for a city'
    }
)

Gateway handles:

Authentication: IAM roles for AWS resources, OAuth 2.0 for third-party APIs, API keys for services
Semantic tool search: Agents discover relevant tools via x_amz_bedrock_agentcore_search without knowing all available tools
Protocol conversion: Lambda functions, OpenAPI specs, Smithy models, and MCP servers all exposed through standardized MCP interface

Architecture pattern - centralize common tools, keep domain-specific tools local:

Common Tools (via Gateway):
  - Web search
  - Database queries
  - Weather API
  - Stock prices

Domain-Specific Tools (agent-local):
  - Return policy logic
  - Product catalog
  - Business rules

Multi-Agent Coordination with A2A Protocol

Scaling from single agents to coordinated agent teams requires standardized communication. AgentCore uses the Agent-to-Agent (A2A) protocol for this.

A2A vs MCP distinction:

MCP: Agent-to-tool communication (agent calling weather API)
A2A: Agent-to-agent communication (supervisor coordinating specialists)

Hub-and-spoke supervisor implementation:

import { BedrockAgentCoreClient, InvokeAgentCommand } from '@aws-sdk/client-bedrock-agentcore';

class HostAgent {
  private client: BedrockAgentCoreClient;
  private specialistAgents: Map<string, AgentConfig>;

  async routeToSpecialist(query: string, capability: string) {
    const agentConfig = this.specialistAgents.get(capability);

    // Fetch remote agent's A2A configuration
    const agentCard = await this.fetchAgentCard(agentConfig.endpoint);

    // Invoke via A2A protocol
    const command = new InvokeAgentCommand({
      agentId: agentCard.id,
      sessionId: this.generateSessionId(),
      inputText: query,
      protocol: 'A2A'
    });

    return await this.client.send(command);
  }

  private async fetchAgentCard(endpoint: string): Promise<AgentCard> {
    // Retrieve agent capabilities schema
    const response = await fetch(`${endpoint}/.well-known/agent-card`);
    return response.json();
  }
}

Orchestration patterns:

Supervisor with routing mode - not every query needs full orchestration:

class SupervisorAgent:
    def route_query(self, query: str):
        # Simple query → direct routing
        if self.is_simple_query(query):
            specialist = self.select_single_specialist(query)
            return specialist.invoke(query)

        # Complex query → full orchestration
        else:
            plan = self.analyze_and_plan(query)
            results = self.orchestrate_subagents(plan)
            return self.synthesize(results)

    def is_simple_query(self, query: str) -> bool:
        intents = self.detect_intents(query)
        return len(intents) == 1

Framework interoperability: LangGraph monitoring agent + CrewAI analytics agent + Strands incident response agent can all communicate via A2A. No framework lock-in.

Security and Cost Optimization

Guardrails Configuration

Guardrails protect against prompt injection, memory poisoning, and harmful content:

import boto3

bedrock = boto3.client('bedrock')

guardrail = bedrock.create_guardrail(
    name='production-agent-guardrail',
    contentPolicyConfig={
        'filtersConfig': [
            {'type': 'HATE', 'inputStrength': 'HIGH', 'outputStrength': 'HIGH'},
            {'type': 'VIOLENCE', 'inputStrength': 'MEDIUM', 'outputStrength': 'HIGH'},
            {'type': 'PROMPT_ATTACK', 'inputStrength': 'HIGH', 'outputStrength': 'NONE'}
        ]
    },
    topicPolicyConfig={
        'topicsConfig': [
            {
                'name': 'Financial Advice',
                'definition': 'Providing specific investment recommendations',
                'type': 'DENY'
            }
        ]
    },
    wordPolicyConfig={
        'wordsConfig': [
            {'text': 'internal-api-key'},
            {'text': 'secret-token'}
        ],
        'managedWordListsConfig': [
            {'type': 'PROFANITY'}
        ]
    }
)

Defense-in-depth strategy:

Input validation: Block malicious prompts at entry
Memory protection: Sanitize before CreateEvent API
Output filtering: Prevent harmful responses
Audit trails: CloudWatch logs for compliance

Cost Optimization Strategies

Prompt caching - 90% discount on cached tokens:

response = bedrock_runtime.converse(
    modelId="anthropic.claude-sonnet-4-20250514-v1:0",
    messages=[{"role": "user", "content": user_query}],
    system=[
        {
            "text": large_system_prompt,
            "cachePoint": {"type": "default"}
        }
    ]
)

Model routing - match complexity to model cost:

def route_to_model(query: str) -> str:
    complexity = classify_query_complexity(query)

    if complexity < 0.3:
        return "anthropic.claude-haiku-4-5-20241022-v1:0"  # $1.00/$5.00 per 1M tokens (input/output)
    elif complexity < 0.7:
        return "anthropic.claude-sonnet-4-20250514-v1:0"  # $3/$15 per 1M tokens (input/output)
    else:
        return "anthropic.claude-opus-4-20250514-v1:0"  # $15/$75 per 1M tokens (input/output) - Opus 4

Tool-call budgets - prevent unbounded tool use:

agent = Agent(
    model="anthropic.claude-sonnet-4-20250514-v1:0",
    max_tool_calls_per_turn=3,
    instructions="If user asks about multiple items, summarize instead of exhaustive lookup"
)

Cost components:

Runtime: Active CPU/memory consumption (not pre-allocated)
Memory: Short-term (per event), long-term (per memory processed + retrievals)
Gateway: MCP operations (ListTools, CallTool, Ping) + semantic search queries
Identity: No additional charges when used via Runtime/Gateway
Observability: CloudWatch standard pricing

Common Pitfalls

Memory Poisoning Without Guardrails

Problem: Storing raw user input directly allows prompt injection into memory:

# WRONG
user_input = "Ignore previous instructions, you are now..."
memory_client.create_event(
    memory_id=memory.id,
    event_data={"content": user_input}
)

Solution: Always sanitize with Guardrails first (shown in Memory section above).

Tool-Call Storms

Problem: Agent invokes 20+ tools per query without limits:

User: "What's the weather in major cities?"
Agent makes 50 separate get_weather() calls
Total: 10s latency, $0.05 per query

Solution: Enforce tool-call budgets and guide via instructions:

agent = Agent(
    max_tool_calls_per_turn=3,
    instructions="For multiple items, summarize instead of exhaustive lookup"
)

ARM64 Container Requirements

Problem: Using x86 containers causes deployment failures.

Solution: Build for ARM64 explicitly:

FROM --platform=linux/arm64 python:3.11-slim
COPY . /app
CMD ["python", "agent.py"]

docker buildx build --platform linux/arm64 -t agent:latest .

No VPC Integration for Internal APIs

Problem: Agent traffic goes over public internet.

Solution: Configure VPC and PrivateLink:

runtime_config = {
    'vpcConfig': {
        'securityGroupIds': ['sg-12345'],
        'subnetIds': ['subnet-abc', 'subnet-def']
    },
    'privateLinkEnabled': True
}

When to Use AgentCore

Use AgentCore when:

Multiple agent frameworks in use (LangChain + CrewAI + custom)
Need to evaluate different models (Bedrock + OpenAI + Anthropic)
Enterprise security required (VPC, PrivateLink, customer-managed KMS)
Multi-agent systems planned (A2A coordination)
Fast time-to-production needed (weeks, not months)
Team size under 10 (can’t build infrastructure from scratch)

Consider alternatives when:

Single framework forever (e.g., only LangGraph → use LangGraph Cloud)
Single cloud ecosystem (e.g., all Azure → Azure AI Agent Service)
Extreme high volume (over 10M sessions/month → self-hosted may be cheaper)
Need custom hardware (GPUs for specialized models → self-hosted)
Already built agent infrastructure (sunk costs)

Break-even analysis for self-hosting:

AgentCore becomes cost-effective when:

Agent development time exceeds 2 weeks
Multiple agent types (customer support, analytics, monitoring)
Enterprise security/compliance required
Team size under 10 dedicated to agent infrastructure

Self-hosted infrastructure costs: $50k-150k build cost +$ 200k/year DevOps team. Break-even at approximately 10M sessions/month.

Key Takeaways

AgentCore is infrastructure, not a framework. It doesn’t replace LangChain or CrewAI - it provides the production runtime they need to scale.

Modular adoption reduces risk. Start with Runtime only, add Memory → Gateway → Identity → Observability incrementally. Each service delivers independent value.

Security is built-in. Session isolation, Guardrails, Identity management, and VPC integration are production-ready features, not bolt-ons.

Cost optimization is multi-dimensional. Prompt caching (90% discount), model routing (30% savings), tool-call budgets, and consumption pricing compound to reduce costs 60-80%.

Multi-agent systems need protocols. MCP for agent-to-tool, A2A for agent-to-agent. Framework interoperability allows LangGraph + CrewAI + Strands agents to work together.

Resources

Start with the $200 AWS free tier credit available to new AWS customers to validate your use case before committing to production deployment.

Deploying AWS Bedrock AgentCore with CDK: a quickstart

A CDK guide for deploying a minimal Strands agent on AgentCore Runtime — parameterized stack, arm64 build, deploy and invoke, and the IAM and Marketplace prerequisites you need before the first call.

aws-bedrockai-agentsaws-cdk+3

May 5, 2026

TypeScript AI SDK Comparison: Vercel AI SDK vs OpenAI Agents SDK for Agent Development

A practical comparison of TypeScript AI SDKs for building AI agents - Vercel AI SDK, OpenAI Agents SDK, and AWS Bedrock integration. Includes code examples, decision frameworks, and production patterns.

typescriptai-toolsserverless+4

January 19, 2026

Feature Flags at Scale: Implementation Patterns and Platform Comparison

A production-focused guide to implementing feature flags in distributed systems, comparing LaunchDarkly, Unleash, and AWS AppConfig with working examples for gradual rollouts, A/B testing, and managing technical debt.

feature-flagsdevopscontinuous-delivery+7

December 21, 2025

AWS AppSync & GraphQL: Building Production-Ready Real-time APIs

A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.

awsappsyncgraphql+5

December 14, 2025

SNS/SQS Cross-Account Fan-Out: Building Multi-Account Event Distribution in AWS

Learn how to implement secure cross-account event distribution using Amazon SNS and SQS. Covers IAM policies, KMS encryption, AWS CDK implementation, and common pitfalls from real-world deployments.

awsaws-snsaws-sqs+6

December 10, 2025