2025-12-08
From Chatbots to Autonomous Agents: Architecture Patterns
Explore the architectural evolution from rule-based chatbots to autonomous AI agents. Learn ReAct, Plan-and-Execute, and multi-agent patterns with TypeScript implementations and practical migration strategies.
Abstract
The evolution from rule-based chatbots to autonomous AI agents represents a fundamental architectural shift; not just a capability upgrade. While chatbots follow scripted conversations and respond to predefined intents, AI agents possess memory, planning capabilities, and tool access that enable them to autonomously decompose complex tasks, make decisions, and execute multi-step workflows across systems.
This post explores the architectural journey from simple chatbot systems to sophisticated agent architectures, focusing on design patterns (ReAct, Plan-and-Execute, multi-agent coordination), infrastructure decisions, and practical trade-offs. Rather than treating agents as “better chatbots,” we examine the distinct architectural patterns and when each makes sense for production systems.
The Architecture Evolution Spectrum
Rather than a binary choice, think of chatbot-to-agent evolution as a spectrum:
Level 0: Rule-Based Chatbots - Decision trees and regex patterns. Completely deterministic. Example: “Type 1 for hours, 2 for location”
Level 1: Intent-Driven Chatbots - NLU for intent classification with predefined flows per intent. Example: Customer support FAQ bots
Level 2: Context-Aware Assistants - Conversation memory within session with limited API integrations. Example: Voice assistants (Siri, Alexa)
Level 3: Tool-Using Agents - Dynamic tool selection with single-agent ReAct pattern. Example: Claude Code, GitHub Copilot
Level 4: Planning Agents - Multi-step task decomposition with long-term memory. Example: Research assistants, code generation agents
Level 5: Multi-Agent Systems - Specialized sub-agents with agent coordination patterns. Example: Software development teams, autonomous operations
Understanding Traditional Chatbot Limitations
The Classic Support Bot Scenario
Consider a support chatbot handling: “Why was I charged twice?”
The chatbot needs to:
- Check payment history (Stripe API)
- Verify order status (database)
- Review support tickets (Zendesk)
- Check for known issues (Confluence)
Traditional approach: Hardcode the exact sequence, or ask the user multiple clarifying questions through a decision tree.
Agent approach: Autonomously gather context from all systems, synthesize findings, and propose resolution.
The Integration Explosion Problem
With traditional chatbots: 5 chatbots × 10 backend systems = 50 hardcoded integrations
Each new feature requires updating multiple chatbot flows. No shared learning across chatbots. Maintenance becomes increasingly difficult as systems evolve.
Core Architectural Distinctions
Chatbot Architecture: Input → Intent Classification → Scripted Response → Output
Agent Architecture: Input → Reasoning Loop (Observe → Plan → Act → Reflect) → Tool Execution → Memory Update → Output
Key differences:
- Memory Systems: Long-term knowledge graphs vs. conversation buffers
- Planning Mechanisms: Task decomposition and multi-step reasoning vs. single-turn responses
- Tool Orchestration: Dynamic tool selection and composition vs. fixed API calls
- Autonomy Levels: Self-directed execution vs. user-driven interactions
- Error Recovery: Adaptive retry strategies vs. “I don’t understand” fallbacks
Pattern 1: Traditional Intent-Based Chatbot
Let’s examine a traditional chatbot architecture to understand its limitations:
interface ChatbotMessage {
role: "user" | "assistant";
content: string;
}
interface Intent {
name: string;
confidence: number;
entities: Record<string, any>;
}
class TraditionalChatbot {
private conversationHistory: ChatbotMessage[] = [];
async processMessage(userMessage: string): Promise<string> {
// Add to history (limited to last N messages)
this.conversationHistory.push({ role: "user", content: userMessage });
if (this.conversationHistory.length > 10) {
this.conversationHistory.shift(); // Drop oldest
}
// Intent classification
const intent = await this.classifyIntent(userMessage);
// Route to handler based on intent
switch (intent.name) {
case "check_order":
return await this.handleOrderCheck(intent.entities);
case "return_request":
return await this.handleReturnRequest(intent.entities);
case "product_question":
return await this.handleProductQuestion(intent.entities);
default:
return "I'm not sure how to help with that. Can you rephrase?";
}
}
private async classifyIntent(message: string): Promise<Intent> {
// Call to NLU service or LLM for intent classification
const response = await fetch("https://api.nlp-service.com/classify", {
method: "POST",
body: JSON.stringify({ text: message })
});
return response.json();
}
private async handleOrderCheck(entities: Record<string, any>): Promise<string> {
// Fixed flow: extract order ID → query database → format response
const orderId = entities.order_id;
if (!orderId) {
return "What's your order number?";
}
const order = await this.fetchOrder(orderId);
return `Your order ${orderId} is ${order.status}. Estimated delivery: ${order.eta}`;
}
private async fetchOrder(orderId: string): Promise<any> {
// Database query implementation
return { status: "shipped", eta: "2025-12-05" };
}
}
Limitations highlighted:
- No task decomposition (can’t handle “check all my orders from last month”)
- Memory lost after 10 messages
- Hardcoded intent → handler mapping
- Can’t combine multiple data sources without explicit programming
- No ability to adapt to new scenarios
Pattern 2: ReAct Agent (Reasoning and Acting)
The ReAct pattern enables iterative reasoning with tool use:
Here’s a production-ready implementation:
interface Tool {
name: string;
description: string;
parameters: Record<string, any>;
execute: (params: any) => Promise<any>;
}
interface AgentStep {
thought: string;
action?: { tool: string; input: any };
observation?: any;
}
class ReActAgent {
private tools: Map<string, Tool>;
private memory: ConversationMemory;
private maxIterations = 10;
constructor(tools: Tool[], memorySystem: ConversationMemory) {
this.tools = new Map(tools.map(t => [t.name, t]));
this.memory = memorySystem;
}
async processTask(task: string): Promise<string> {
const steps: AgentStep[] = [];
let finalAnswer: string | null = null;
// Retrieve relevant context from memory
const context = await this.memory.retrieve(task);
for (let i = 0; i < this.maxIterations; i++) {
// Generate next step: thought + action
const step = await this.generateNextStep(task, steps, context);
steps.push(step);
// Check if we have a final answer
if (!step.action) {
finalAnswer = step.thought;
break;
}
// Execute the action
const tool = this.tools.get(step.action.tool);
if (!tool) {
step.observation = { error: `Tool ${step.action.tool} not found` };
continue;
}
try {
const result = await tool.execute(step.action.input);
step.observation = result;
} catch (error) {
step.observation = { error: error.message };
}
}
// Store conversation in long-term memory
await this.memory.store(task, steps, finalAnswer);
return finalAnswer || "I couldn't complete this task within the iteration limit.";
}
private async generateNextStep(
task: string,
previousSteps: AgentStep[],
context: any
): Promise<AgentStep> {
// Build prompt with ReAct pattern
const prompt = this.buildReActPrompt(task, previousSteps, context);
// Call LLM to generate thought and action
const response = await this.callLLM(prompt);
// Parse response into structured step
return this.parseReActResponse(response);
}
private buildReActPrompt(task: string, steps: AgentStep[], context: any): string {
const toolDescriptions = Array.from(this.tools.values())
.map(t => `${t.name}: ${t.description}`)
.join("\n");
const stepHistory = steps.map((s, i) =>
`Step ${i + 1}:\nThought: ${s.thought}\n` +
(s.action ? `Action: ${s.action.tool}(${JSON.stringify(s.action.input)})\n` : "") +
(s.observation ? `Observation: ${JSON.stringify(s.observation)}\n` : "")
).join("\n");
return `You are an AI agent solving tasks by reasoning and using tools.
Task: ${task}
Available Tools:
${toolDescriptions}
Relevant Context from Memory:
${JSON.stringify(context, null, 2)}
Previous Steps:
${stepHistory || "None yet"}
Generate the next step by thinking about what to do, then choosing a tool to use.
If you have enough information to answer, provide the final answer instead of an action.
Format:
Thought: [your reasoning about what to do next]
Action: [tool_name]
Input: [tool input as JSON]
OR if ready to answer:
Thought: [final reasoning]
Answer: [final answer to the task]`;
}
private parseReActResponse(response: string): AgentStep {
// Parse LLM output into structured step
const thoughtMatch = response.match(/Thought: (.+?)(?=\n|$)/s);
const actionMatch = response.match(/Action: (.+?)(?=\n|$)/);
const inputMatch = response.match(/Input: (.+?)(?=\n|$)/s);
const answerMatch = response.match(/Answer: (.+?)(?=\n|$)/s);
const thought = thoughtMatch?.[1].trim() || "";
if (answerMatch) {
// Final answer, no action
return { thought: answerMatch[1].trim() };
}
if (actionMatch && inputMatch) {
return {
thought,
action: {
tool: actionMatch[1].trim(),
input: JSON.parse(inputMatch[1].trim())
}
};
}
return { thought };
}
private async callLLM(prompt: string): Promise<string> {
// Call to LLM API (Anthropic, OpenAI, etc.)
// Implementation would use actual API client
throw new Error("Implement LLM integration");
}
}
Key patterns demonstrated:
- Iterative reasoning loop with configurable max iterations
- Tool descriptions provided in context
- Memory retrieval for long-term context
- Observation feedback incorporated into next step
- Graceful handling of tool errors
- Structured parsing of LLM responses
When to use ReAct:
- Dynamic environments where plans can’t be predetermined
- Tasks requiring step-by-step verification
- Situations where the agent needs to adapt based on observations
- Budget allows $0.01-0.05 per task
Production considerations:
- Implement iteration limits to prevent infinite loops
- Log all thoughts and actions for debugging
- Monitor token consumption (can be 5-10x simple completion)
- Consider streaming thoughts to users for transparency
Pattern 3: Plan-and-Execute
For complex tasks with clear structure, Plan-and-Execute offers better cost efficiency:
Implementation:
interface Task {
id: string;
description: string;
status: "pending" | "in-progress" | "completed" | "failed";
dependencies: string[];
result?: any;
error?: string;
metadata?: any;
}
interface ExecutionPlan {
goal: string;
tasks: Task[];
strategy: string;
}
class PlanAndExecuteAgent {
private tools: Map<string, Tool>;
private memory: ConversationMemory;
async execute(goal: string): Promise<any> {
// Phase 1: Planning
console.error("[Planning Phase] Decomposing goal into tasks...");
const plan = await this.createPlan(goal);
console.error(`[Planning Phase] Created plan with ${plan.tasks.length} tasks`);
// Phase 2: Execution
console.error("[Execution Phase] Executing tasks...");
const results = await this.executePlan(plan);
// Phase 3: Synthesis
console.error("[Synthesis Phase] Combining results...");
const finalResult = await this.synthesizeResults(goal, plan, results);
return finalResult;
}
private async createPlan(goal: string): Promise<ExecutionPlan> {
// Retrieve relevant past plans from memory
const pastExperiences = await this.memory.retrieve(goal);
const planningPrompt = `You are a planning agent. Decompose this goal into executable tasks.
Goal: ${goal}
Available Tools:
${Array.from(this.tools.values()).map(t => `- ${t.name}: ${t.description}`).join("\n")}
Past Similar Tasks:
${JSON.stringify(pastExperiences, null, 2)}
Create a plan with tasks that:
1. Are independent where possible (for parallel execution)
2. Explicitly state dependencies
3. Map to available tools
4. Include verification steps
Return format:
{
"strategy": "explanation of approach",
"tasks": [
{
"id": "task-1",
"description": "what to do",
"tool": "tool_name",
"dependencies": [],
"params": {}
}
]
}`;
const planResponse = await this.callLLM(planningPrompt);
const planData = JSON.parse(planResponse);
return {
goal,
strategy: planData.strategy,
tasks: planData.tasks.map((t: any) => ({
id: t.id,
description: t.description,
status: "pending" as const,
dependencies: t.dependencies || [],
metadata: { tool: t.tool, params: t.params }
}))
};
}
private async executePlan(plan: ExecutionPlan): Promise<Map<string, any>> {
const results = new Map<string, any>();
const taskMap = new Map(plan.tasks.map(t => [t.id, t]));
// Execute tasks respecting dependencies
while (results.size < plan.tasks.length) {
// Find tasks ready to execute (no pending dependencies)
const readyTasks = plan.tasks.filter(task => {
if (task.status !== "pending") return false;
return task.dependencies.every(depId => {
const depTask = taskMap.get(depId);
return depTask?.status === "completed";
});
});
if (readyTasks.length === 0) {
// Check if we're stuck (circular dependencies or all failed)
const pendingTasks = plan.tasks.filter(t => t.status === "pending");
if (pendingTasks.length > 0) {
console.error("[Execution Phase] Stuck - circular dependencies detected");
break;
}
break;
}
// Execute ready tasks in parallel
console.error(`[Execution Phase] Executing ${readyTasks.length} tasks in parallel`);
await Promise.all(
readyTasks.map(task => this.executeTask(task, results))
);
}
return results;
}
private async executeTask(task: Task, results: Map<string, any>): Promise<void> {
task.status = "in-progress";
console.error(`[Task ${task.id}] Starting: ${task.description}`);
try {
// Get dependency results
const depResults = task.dependencies.reduce((acc, depId) => {
acc[depId] = results.get(depId);
return acc;
}, {} as Record<string, any>);
// Execute tool with parameters and dependency results
const tool = this.tools.get(task.metadata.tool);
if (!tool) {
throw new Error(`Tool ${task.metadata.tool} not found`);
}
const params = {
...task.metadata.params,
dependencyResults: depResults
};
const result = await tool.execute(params);
task.status = "completed";
task.result = result;
results.set(task.id, result);
console.error(`[Task ${task.id}] Completed successfully`);
} catch (error) {
task.status = "failed";
task.error = error.message;
results.set(task.id, { error: error.message });
console.error(`[Task ${task.id}] Failed: ${error.message}`);
}
}
private async synthesizeResults(
goal: string,
plan: ExecutionPlan,
results: Map<string, any>
): Promise<any> {
const synthesisPrompt = `You executed a plan to achieve a goal. Synthesize the results into a coherent answer.
Goal: ${goal}
Plan Strategy: ${plan.strategy}
Task Results:
${Array.from(results.entries()).map(([id, result]) =>
`${id}: ${JSON.stringify(result)}`
).join("\n")}
Provide a comprehensive answer to the original goal, incorporating insights from all tasks.`;
const synthesis = await this.callLLM(synthesisPrompt);
// Store successful plan in memory for future reference
if (results.size === plan.tasks.length) {
await this.memory.store(goal, { plan, results: Array.from(results.entries()) }, synthesis);
}
return synthesis;
}
private async callLLM(prompt: string): Promise<string> {
throw new Error("Implement LLM integration");
}
}
Trade-offs:
- Pros: Fewer LLM calls (plan once, execute), parallel execution, predictable costs
- Cons: Brittle when environment changes mid-execution, harder to adapt to unexpected results
Best practices:
- Store successful plans in memory for reuse
- Include verification tasks in the plan
- Allow re-planning if execution fails
- Use timeouts for individual tasks
Memory Architecture: Short-Term vs Long-Term
One of the most significant differences between chatbots and agents is memory architecture:
Implementation comparison:
interface MemoryEntry {
timestamp: Date;
content: any;
metadata: Record<string, any>;
embedding?: number[];
}
// Simple buffer memory (chatbot style)
class BufferMemory {
private buffer: MemoryEntry[] = [];
private maxSize = 10;
async store(content: any, metadata: Record<string, any> = {}): Promise<void> {
this.buffer.push({ timestamp: new Date(), content, metadata });
if (this.buffer.length > this.maxSize) {
this.buffer.shift(); // FIFO eviction
}
}
async retrieve(query: string): Promise<any[]> {
// Return all buffer contents (no filtering)
return this.buffer.map(e => e.content);
}
async clear(): Promise<void> {
this.buffer = [];
}
}
// Vector-based long-term memory (agent style)
class VectorMemory {
private vectorStore: VectorDatabase;
private embeddingModel: EmbeddingModel;
constructor(vectorStore: VectorDatabase, embeddingModel: EmbeddingModel) {
this.vectorStore = vectorStore;
this.embeddingModel = embeddingModel;
}
async store(content: any, metadata: Record<string, any> = {}): Promise<void> {
// Generate embedding for semantic search
const text = this.contentToText(content);
const embedding = await this.embeddingModel.embed(text);
await this.vectorStore.insert({
timestamp: new Date(),
content,
metadata: {
...metadata,
importance: this.calculateImportance(content, metadata)
},
embedding
});
}
async retrieve(query: string, options: { limit?: number; threshold?: number } = {}): Promise<any[]> {
// Semantic search using embeddings
const queryEmbedding = await this.embeddingModel.embed(query);
const results = await this.vectorStore.search({
embedding: queryEmbedding,
limit: options.limit || 5,
threshold: options.threshold || 0.7
});
// Return most relevant memories, weighted by recency and importance
return results
.map(r => ({
content: r.content,
relevance: r.similarity,
recency: this.calculateRecency(r.timestamp),
importance: r.metadata.importance
}))
.sort((a, b) => {
const scoreA = a.relevance * 0.6 + a.recency * 0.2 + a.importance * 0.2;
const scoreB = b.relevance * 0.6 + b.recency * 0.2 + b.importance * 0.2;
return scoreB - scoreA;
})
.map(r => r.content);
}
async forget(criteria: { olderThan?: Date; importance?: number }): Promise<void> {
// Selective forgetting based on time and importance
const deleteFilter: any = {};
if (criteria.olderThan) {
deleteFilter.timestamp = { $lt: criteria.olderThan };
}
if (criteria.importance !== undefined) {
deleteFilter["metadata.importance"] = { $lt: criteria.importance };
}
await this.vectorStore.delete(deleteFilter);
}
private calculateImportance(content: any, metadata: Record<string, any>): number {
// Heuristic scoring: user corrections, explicit feedback, task outcomes
let score = 0.5; // baseline
if (metadata.userCorrection) score += 0.3;
if (metadata.explicitFeedback) score += 0.2;
if (metadata.taskSuccess === false) score += 0.15; // Learn from failures
if (metadata.toolError) score += 0.1; // Remember issues
return Math.min(score, 1.0);
}
private calculateRecency(timestamp: Date): number {
const ageMs = Date.now() - timestamp.getTime();
const ageDays = ageMs / (1000 * 60 * 60 * 24);
// Exponential decay: fresh memories score higher
return Math.exp(-ageDays / 30); // 30-day half-life
}
private contentToText(content: any): string {
if (typeof content === "string") return content;
return JSON.stringify(content);
}
}
// Hybrid memory system for production agents
class HybridMemory implements ConversationMemory {
private shortTerm: BufferMemory;
private longTerm: VectorMemory;
constructor(vectorStore: VectorDatabase, embeddingModel: EmbeddingModel) {
this.shortTerm = new BufferMemory();
this.longTerm = new VectorMemory(vectorStore, embeddingModel);
}
async store(task: string, steps: any[], result: any): Promise<void> {
// Store in short-term for immediate recall
await this.shortTerm.store({ task, steps, result });
// Store in long-term for semantic retrieval
await this.longTerm.store(
{ task, steps, result },
{
taskSuccess: result !== null,
stepCount: steps.length,
timestamp: new Date()
}
);
}
async retrieve(query: string): Promise<any> {
// Combine both memory systems
const recent = await this.shortTerm.retrieve(query);
const relevant = await this.longTerm.retrieve(query, { limit: 3 });
return {
recentContext: recent,
relevantExperiences: relevant
};
}
}
Memory comparison insights:
- Buffer memory: Fast, simple, no semantic understanding
- Vector memory: Semantic search, importance-weighted, selective forgetting
- Hybrid approach: Best of both for production agents
Multi-Agent Coordination Patterns
For complex systems requiring specialized expertise:
Orchestrator pattern (recommended for production):
- Clear control flow
- Easier to debug
- Predictable costs
- Single point of failure (mitigated with retries)
Peer-to-peer pattern (experimental):
- Decentralized
- Fault-tolerant
- Hard to debug
- Unpredictable costs
Implementation:
interface AgentCapability {
domain: string;
description: string;
tools: string[];
}
interface SubAgent {
id: string;
capability: AgentCapability;
execute: (task: string) => Promise<any>;
}
class OrchestratorAgent {
private subAgents: Map<string, SubAgent>;
private memory: ConversationMemory;
constructor(subAgents: SubAgent[], memory: ConversationMemory) {
this.subAgents = new Map(subAgents.map(a => [a.id, a]));
this.memory = memory;
}
async handleRequest(userRequest: string): Promise<any> {
console.error("[Orchestrator] Analyzing request...");
// Step 1: Analyze request and determine required agents
const analysis = await this.analyzeRequest(userRequest);
console.error(`[Orchestrator] Routing to ${analysis.requiredAgents.length} agents`);
// Step 2: Route to appropriate subagents
const subResults = await this.coordinateSubAgents(analysis);
// Step 3: Synthesize results
console.error("[Orchestrator] Synthesizing results...");
const finalAnswer = await this.synthesize(userRequest, analysis, subResults);
return finalAnswer;
}
private async analyzeRequest(request: string): Promise<{
intent: string;
requiredAgents: string[];
executionStrategy: "sequential" | "parallel" | "iterative";
}> {
const agentDescriptions = Array.from(this.subAgents.values())
.map(a => `${a.id}: ${a.capability.description}`)
.join("\n");
const analysisPrompt = `You are an orchestrator analyzing which specialized agents to use.
User Request: ${request}
Available Agents:
${agentDescriptions}
Determine:
1. What is the user trying to accomplish (intent)?
2. Which agents are needed?
3. Should they work sequentially (one after another) or in parallel?
Return JSON:
{
"intent": "description",
"requiredAgents": ["agent-id-1", "agent-id-2"],
"executionStrategy": "sequential" | "parallel"
}`;
const response = await this.callLLM(analysisPrompt);
return JSON.parse(response);
}
private async coordinateSubAgents(analysis: {
intent: string;
requiredAgents: string[];
executionStrategy: "sequential" | "parallel" | "iterative";
}): Promise<Map<string, any>> {
const results = new Map<string, any>();
if (analysis.executionStrategy === "parallel") {
// Run all agents simultaneously
const agentPromises = analysis.requiredAgents.map(async agentId => {
const agent = this.subAgents.get(agentId);
if (!agent) return null;
console.error(`[SubAgent ${agentId}] Starting parallel execution`);
const result = await agent.execute(analysis.intent);
results.set(agentId, result);
return result;
});
await Promise.all(agentPromises);
} else if (analysis.executionStrategy === "sequential") {
// Run agents one after another, passing context
let context = analysis.intent;
for (const agentId of analysis.requiredAgents) {
const agent = this.subAgents.get(agentId);
if (!agent) continue;
console.error(`[SubAgent ${agentId}] Starting sequential execution`);
const result = await agent.execute(context);
results.set(agentId, result);
// Next agent gets previous results as context
context = `${analysis.intent}\n\nPrevious agent results: ${JSON.stringify(result)}`;
}
}
return results;
}
private async synthesize(
request: string,
analysis: any,
results: Map<string, any>
): Promise<any> {
const synthesisPrompt = `Combine results from multiple specialized agents into a coherent response.
User Request: ${request}
Agent Results:
${Array.from(results.entries()).map(([id, result]) =>
`${id}:\n${JSON.stringify(result, null, 2)}`
).join("\n\n")}
Provide a comprehensive, natural response that addresses the user's request.`;
return await this.callLLM(synthesisPrompt);
}
private async callLLM(prompt: string): Promise<string> {
throw new Error("Implement LLM integration");
}
}
Safety and Guardrails
Production agents require multiple layers of safety:
Implementation:
class GuardrailSystem {
async validateInput(input: string): Promise<{ safe: boolean; reason?: string }> {
// Check for prompt injection patterns
const injectionPatterns = [
/ignore previous instructions/i,
/new instructions:/i,
/you are now/i,
/system prompt/i
];
for (const pattern of injectionPatterns) {
if (pattern.test(input)) {
return { safe: false, reason: "Potential prompt injection detected" };
}
}
// Call content moderation API
const moderation = await this.callModerationAPI(input);
if (!moderation.safe) {
return { safe: false, reason: moderation.reason };
}
return { safe: true };
}
async authorizeToolUse(
agentId: string,
toolName: string,
params: any
): Promise<{ authorized: boolean; reason?: string }> {
// Check against permission matrix
const permissions = await this.getAgentPermissions(agentId);
if (!permissions.tools.includes(toolName)) {
return { authorized: false, reason: `Agent lacks permission for tool: ${toolName}` };
}
// Check for sensitive operations requiring elevated permissions
if (this.isSensitiveTool(toolName) && !permissions.elevated) {
return { authorized: false, reason: "Sensitive tool requires elevated permissions" };
}
// Rate limiting
const withinRateLimit = await this.checkRateLimit(agentId, toolName);
if (!withinRateLimit) {
return { authorized: false, reason: "Rate limit exceeded" };
}
return { authorized: true };
}
async filterOutput(output: string): Promise<{ filtered: string; blocked: boolean }> {
// PII detection and redaction
const piiRedacted = this.redactPII(output);
// Content policy check
const policyCheck = await this.checkContentPolicy(piiRedacted);
if (!policyCheck.compliant) {
return { filtered: "", blocked: true };
}
return { filtered: piiRedacted, blocked: false };
}
private redactPII(text: string): string {
// Email redaction
text = text.replace(/\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, "[EMAIL_REDACTED]");
// Phone number redaction (US format)
text = text.replace(/\b\d{3}[-.]?\d{3}[-.]?\d{4}\b/g, "[PHONE_REDACTED]");
// Credit card redaction
text = text.replace(/\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g, "[CARD_REDACTED]");
return text;
}
private async callModerationAPI(input: string): Promise<{ safe: boolean; reason?: string }> {
// Implementation with moderation service
return { safe: true };
}
private async getAgentPermissions(agentId: string): Promise<any> {
// Fetch from permission store
return { tools: [], elevated: false };
}
private isSensitiveTool(toolName: string): boolean {
const sensitivTools = ["delete-data", "modify-permissions", "send-money"];
return sensitivTools.includes(toolName);
}
private async checkRateLimit(agentId: string, toolName: string): Promise<boolean> {
// Rate limiting logic
return true;
}
private async checkContentPolicy(text: string): Promise<{ compliant: boolean }> {
// Policy checking
return { compliant: true };
}
}
Cost Analysis and Trade-offs
Token Consumption Comparison
For a typical task like “Check order status and process refund”:
| Architecture | LLM Calls | Avg Tokens | Cost per Task |
|---|---|---|---|
| Chatbot | 2-3 | 1,000 | $0.002 |
| ReAct Agent | 5-8 | 8,000 | $0.016 |
| Plan-Execute Agent | 3-4 | 4,000 | $0.008 |
| Multi-Agent | 6-10 | 10,000 | $0.020 |
Costs based on Claude Sonnet pricing: 15/M output tokens. Note: Prompt caching and batch processing can reduce costs by 50-90%
Infrastructure Costs
- Chatbot: Minimal (stateless API)
- Single Agent: Moderate (vector DB for memory: $50-200/month)
- Multi-Agent: Higher (coordination layer, multiple DBs: $200-500/month)
Performance Characteristics
Latency:
- Chatbot: 500ms - 2s (single LLM call)
- ReAct Agent: 5s - 30s (multiple iterations)
- Plan-Execute: 3s - 15s (planning overhead, parallel execution)
- Multi-Agent: 10s - 60s (coordination + multiple agents)
Accuracy (for complex multi-step tasks):
- Chatbot: 40-60% (limited by predefined flows)
- ReAct Agent: 70-85% (adaptive, but can get stuck)
- Plan-Execute: 75-90% (structured approach)
- Multi-Agent: 80-95% (specialized expertise)
When to Use What
Use Chatbot when:
- Tasks are well-defined with clear intents (< 20 intents)
- Responses can be scripted or template-based
- Budget is tight ($0.001-0.005 per interaction)
- Latency must be < 2 seconds
- Minimal maintenance staff
Use ReAct Agent when:
- Tasks require dynamic adaptation
- Can’t predict all scenarios upfront
- Need transparency (audit trail of reasoning)
- Budget allows $0.01-0.05 per task
- Have LLM expertise on team
Use Plan-Execute Agent when:
- Complex tasks with clear structure
- Can benefit from parallel execution
- Need predictable costs
- Quality matters more than speed
- Tasks can be decomposed logically
Use Multi-Agent System when:
- Require specialized expertise across domains
- Need highest accuracy
- Can justify 5-10x cost vs chatbot
- Have team to maintain coordination logic
- Failure cost is high (healthcare, finance)
Common Pitfalls and Solutions
Pitfall 1: Infinite Loops in ReAct Agents
The agent gets stuck repeating same tool calls.
Solution: Detect and break loops
async function reactLoopWithDetection(task: string) {
const actionHistory = new Set<string>();
for (let i = 0; i < maxIterations; i++) {
const step = await generateStep();
// Create signature of this action
const actionSignature = `${step.action.tool}:${JSON.stringify(step.action.input)}`;
if (actionHistory.has(actionSignature)) {
console.error("[Loop Detected] Breaking out of repeated action");
return { error: "Agent stuck in loop, terminating" };
}
actionHistory.add(actionSignature);
await executeStep(step);
}
}
Pitfall 2: Context Window Overflow
Conversation history grows beyond context limit.
Solution: Implement sliding window with summarization
class ManagedConversationHistory {
private messages: Message[] = [];
private maxMessages = 20;
private summaries: string[] = [];
async add(message: Message) {
this.messages.push(message);
if (this.messages.length > this.maxMessages) {
// Summarize oldest 10 messages
const toSummarize = this.messages.splice(0, 10);
const summary = await this.summarize(toSummarize);
this.summaries.push(summary);
}
}
getContext(): string {
return [
...this.summaries.map(s => `[Summary] ${s}`),
...this.messages.map(m => `${m.role}: ${m.content}`)
].join("\n");
}
}
Pitfall 3: Tool Description Bloat
Providing too many tools or verbose descriptions.
Solution: Load tools dynamically based on task context
class ContextualToolLoader {
async getRelevantTools(task: string): Promise<Tool[]> {
// Use semantic search to find relevant tools
const taskEmbedding = await embed(task);
const relevantTools = await this.vectorStore.search({
embedding: taskEmbedding,
limit: 8, // Max 8 tools at a time
threshold: 0.6
});
return relevantTools.map(t => ({
name: t.name,
description: t.shortDescription, // Use concise version
parameters: t.parameters
}));
}
}
Progressive Migration Strategy
Start with chatbot, add agent capabilities incrementally:
class HybridChatbotAgent {
private intentClassifier: IntentClassifier;
private agentMode: boolean = false;
async process(message: string): Promise<string> {
// Try intent-based handling first (fast, cheap)
const intent = await this.intentClassifier.classify(message);
if (intent.confidence > 0.85 && !intent.requiresToolUse) {
// Use traditional chatbot flow
return await this.handleIntent(intent);
}
// Fall back to agent mode for complex queries
console.error("[Hybrid] Switching to agent mode for complex query");
this.agentMode = true;
return await this.agentProcess(message);
}
}
Success metrics: 80% of queries handled by fast chatbot path, 20% by agent, resulting in 40% cost reduction compared to pure agent approach.
Tools and Technologies
Agent Frameworks
LangGraph (LangChain):
- Language: Python, TypeScript
- Strengths: State management, graph-based workflows, production-ready
- Use Case: Structured agent workflows with complex state
AutoGen (Microsoft):
- Language: Python
- Strengths: Multi-agent conversations, built-in patterns
- Use Case: Collaborative multi-agent systems
- Note: AutoGen is in maintenance mode, being superseded by Microsoft’s Agent Framework
CrewAI:
- Language: Python
- Strengths: Role-based agents, lightweight
- Use Case: Team-like agent coordination
Memory Systems
Vector Databases:
- Pinecone: Managed, serverless
- Qdrant: Open-source, self-hosted
- Weaviate: GraphQL interface, hybrid search
- Chroma: Lightweight, embedded option
Specialized Memory:
- Mem0: Intelligent memory layer with priority scoring (recently raised Series A, AWS partnership)
- Letta (formerly MemGPT): Memory blocks for context management
Observability
LangSmith: Trace agent executions, debug reasoning chains, A/B testing for prompts
Langfuse: Open-source LLM observability, cost tracking, latency monitoring
Helicone: LLM request monitoring, cost analytics, caching
Key Takeaways
-
Architecture Evolution: Chatbots and agents sit on a continuum; choose based on task complexity, budget, and team expertise
-
Pattern Selection Matters: ReAct for dynamic adaptation, Plan-Execute for structured tasks, multi-agent for specialization
-
Memory is Critical: Long-term memory differentiates agents from chatbots; invest in vector databases and retrieval strategies
-
Guardrails are Non-Negotiable: Implement input validation, tool authorization, output filtering, and human-in-the-loop for production systems
-
Cost vs Quality Trade-off: Agents can be 5-10x more expensive than chatbots but deliver 2-3x higher accuracy on complex tasks
-
Tool Design Principles: Small, composable tools beat monolithic ones; easier to test, debug, and reuse
-
Progressive Enhancement: Start with chatbot, add agent capabilities incrementally as needs grow
-
Evaluation is Essential: Track completion rate, tokens per task, latency, and user satisfaction; iterate based on data
-
Error Recovery Wins: Intelligent retry logic with fallback strategies separates production agents from prototypes
-
Context Window Management: Summarization, structured notes, and sub-agents prevent context overflow in long conversations
This architectural journey from chatbots to autonomous agents represents more than adding capabilities; it’s a fundamental shift in how we design AI systems. The patterns and practices outlined here provide a foundation for building production-ready agent systems that balance autonomy with control.
Related posts
Learn how SOLID principles apply to modern JavaScript development. Practical examples with TypeScript, React hooks, and functional patterns - plus when to use them and when they're overkill.
Learn how AWS Bedrock AgentCore solves the infrastructure challenges of deploying agentic AI at scale - from prototype to production with runtime, memory, gateway, and multi-agent coordination.
A CDK guide for deploying a minimal Strands agent on AgentCore Runtime — parameterized stack, arm64 build, deploy and invoke, and the IAM and Marketplace prerequisites you need before the first call.
How Zapier MCP provides action-level whitelisting, credential isolation, and human-in-the-loop approval for AI agents. A managed alternative to custom scoped proxies for multi-app API governance.
A deep dive into building SaaS authorization with AWS Cognito and Verified Permissions. Covers Cedar policy language, multi-tenant patterns, JWT token flow, cost analysis, and common mistakes with TypeScript examples.