2025-09-04
CQRS with Serverless: How I Cut DynamoDB Costs by 70% and Improved Performance
Real-world CQRS implementation with AWS Lambda, EventBridge, and DynamoDB. Learn from my mistakes implementing event sourcing, handling eventual consistency, and debugging distributed systems in production.
What is CQRS and Why Should You Care?
CQRS (Command Query Responsibility Segregation) is an architectural pattern that separates your write operations (commands) from your read operations (queries). Instead of using the same model for both reading and writing data, you optimize each side for its specific purpose.
The Core Principle
In traditional architectures, you typically use the same data model for both reading and writing:
// Traditional approach - same model for everything
class OrderService {
async createOrder(orderData) {
// Write to the same table
return await db.orders.insert(orderData);
}
async getOrderHistory(customerId) {
// Read from the same table with complex joins
return await db.orders.find()
.join('customers')
.join('products')
.where('customerId', customerId);
}
}
With CQRS, you split this into two optimized models:
// CQRS approach - separate optimized models
class OrderCommandService {
async createOrder(orderData) {
// Write-optimized: Simple, fast inserts
await writeDb.orders.insert(orderData);
// Publish event for read model updates
await eventBus.publish('OrderCreated', orderData);
}
}
class OrderQueryService {
async getOrderHistory(customerId) {
// Read-optimized: Pre-computed, denormalized data
return await readDb.customerOrderHistory.find(customerId);
}
}
Why CQRS Solves Real Problems
CQRS isn’t just theoretical - it addresses specific, measurable problems:
- Performance Mismatch: Writes need validation and consistency, reads need speed
- Scale Mismatch: Most systems have 10:1 or 100:1 read-to-write ratios
- Model Complexity: Optimizing for writes makes reads complex, and vice versa
- Team Parallelization: Different teams can work on read and write sides independently
When CQRS Makes Sense
Use CQRS when you have:
- High read-to-write ratios (10:1 or higher)
- Different performance requirements for reads vs writes
- Complex reporting or analytics needs
- Need to scale reads and writes independently
- Multiple data representation needs (APIs, reports, dashboards)
Avoid CQRS when you have:
- Simple CRUD applications
- Low traffic applications
- Strong consistency requirements everywhere
- Small team that can’t handle the complexity
- Similar read and write patterns
The Real-World Impact
Working with an e-commerce platform, I discovered how DynamoDB throttling errors during flash sales taught me about competing read and write operations. Implementing CQRS revealed cost reduction opportunities and performance improvements that eliminated throttling errors during high-traffic events.
But here’s the key insight: CQRS isn’t about the tools you use, it’s about recognizing when your read and write needs are fundamentally different.
The Problem That Led Me to CQRS
Let me show you exactly why CQRS became necessary with a real example. A monolithic Lambda function was handling everything - product catalog reads, order processing, inventory updates. During a flash sale, several issues emerged:
- DynamoDB throttling: High write operations from orders competing with read operations from browsing users
- Lambda timeouts: Complex aggregation queries taking significant time
- Cost challenges: Provisioned capacity needed only during peak periods
- Data inconsistency: Inventory counts affected by concurrent updates
The worst part? Product detail pages (the majority of traffic) were slow because they shared the same data model optimized for order processing.
This is the classic scenario where CQRS shines: when your read and write workloads have completely different characteristics and requirements.
Our Architecture Evolution
Before CQRS (The Monolith):
// Single Lambda handling everything - seemed simple at first
export const handler = async (event: APIGatewayEvent) => {
const { httpMethod, path } = event;
if (httpMethod === 'GET' && path === '/products') {
// Complex query joining 3 tables
const products = await dynamoClient.query({
TableName: 'MainTable',
IndexName: 'GSI1',
KeyConditionExpression: 'GSI1PK = :pk',
ExpressionAttributeValues: { ':pk': 'PRODUCT' }
}).promise();
// Then fetch inventory for each product (N+1 query problem)
for (const product of products.Items) {
const inventory = await getInventory(product.id);
product.availableQuantity = inventory.quantity;
}
return { statusCode: 200, body: JSON.stringify(products) };
}
if (httpMethod === 'POST' && path === '/orders') {
// Write to same table, competing for throughput
await createOrder(JSON.parse(event.body));
}
};
After CQRS (Separated Concerns):
The Command Side: Handling Writes
// commands/create-order.ts - Focused solely on order processing
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, PutCommand } from '@aws-sdk/lib-dynamodb';
import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
import { z } from 'zod';
import { ulid } from 'ulid';
const dynamoClient = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const eventBridge = new EventBridgeClient({});
// Input validation with Zod - caught so many bugs in production
const CreateOrderSchema = z.object({
customerId: z.string().uuid(),
items: z.array(z.object({
productId: z.string(),
quantity: z.number().positive(),
price: z.number().positive()
})).min(1),
shippingAddress: z.object({
street: z.string(),
city: z.string(),
country: z.string(),
postalCode: z.string()
})
});
export const handler = async (event: any) => {
// Parse and validate input
const input = CreateOrderSchema.parse(JSON.parse(event.body));
const orderId = ulid(); // Time-sortable IDs - game changer for debugging
const timestamp = Date.now();
// Write to command store (write-optimized table)
const order = {
PK: `ORDER#${orderId}`,
SK: `ORDER#${orderId}`,
id: orderId,
customerId: input.customerId,
items: input.items,
total: input.items.reduce((sum, item) => sum + (item.price * item.quantity), 0),
status: 'PENDING',
createdAt: timestamp,
updatedAt: timestamp,
version: 1 // Optimistic locking - saved us from race conditions
};
try {
await dynamoClient.send(new PutCommand({
TableName: process.env.WRITE_TABLE_NAME!,
Item: order,
ConditionExpression: 'attribute_not_exists(PK)' // Prevent duplicates
}));
// Publish event for read model updates
await eventBridge.send(new PutEventsCommand({
Entries: [{
Source: 'orders.service',
DetailType: 'OrderCreated',
Detail: JSON.stringify({
orderId,
customerId: input.customerId,
items: input.items,
total: order.total,
timestamp
}),
EventBusName: process.env.EVENT_BUS_NAME
}]
}));
return {
statusCode: 201,
body: JSON.stringify({ orderId, status: 'CREATED' })
};
} catch (error) {
console.error('Order creation failed:', error);
// Implement proper error handling and compensation
throw error;
}
};
The Query Side: Optimized Reads
// queries/get-product-catalog.ts - Read-optimized for performance
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, GetCommand, QueryCommand } from '@aws-sdk/lib-dynamodb';
const dynamoClient = DynamoDBDocumentClient.from(new DynamoDBClient({}));
// Pre-computed, denormalized data for fast reads
export const handler = async (event: any) => {
const { category, limit = 20, lastKey } = event.queryStringParameters || {};
// Read from read-optimized table with pre-computed aggregations
const response = await dynamoClient.send(new QueryCommand({
TableName: process.env.READ_TABLE_NAME!,
IndexName: 'CategoryIndex',
KeyConditionExpression: 'category = :category',
ExpressionAttributeValues: {
':category': category || 'ALL'
},
Limit: parseInt(limit),
ExclusiveStartKey: lastKey ? JSON.parse(Buffer.from(lastKey, 'base64').toString()) : undefined,
// Only fetch what we need for listing
ProjectionExpression: 'id, #n, price, imageUrl, averageRating, reviewCount, inStock',
ExpressionAttributeNames: {
'#n': 'name' // 'name' is a reserved word in DynamoDB
}
}));
return {
statusCode: 200,
headers: {
'Cache-Control': 'public, max-age=300', // 5-minute cache for product listings
},
body: JSON.stringify({
products: response.Items,
nextKey: response.LastEvaluatedKey
? Buffer.from(JSON.stringify(response.LastEvaluatedKey)).toString('base64')
: null
})
};
};
The Event Processor: Keeping Models in Sync
This is where the magic happens - and where most CQRS implementations fail:
// processors/sync-read-models.ts - The critical synchronization layer
import { EventBridgeEvent } from 'aws-lambda';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, UpdateCommand, BatchWriteCommand } from '@aws-sdk/lib-dynamodb';
import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs';
const dynamoClient = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const sqsClient = new SQSClient({});
interface OrderCreatedEvent {
orderId: string;
customerId: string;
items: Array<{ productId: string; quantity: number; price: number }>;
total: number;
timestamp: number;
}
export const handler = async (event: EventBridgeEvent<'OrderCreated', OrderCreatedEvent>) => {
const { detail } = event;
// Update multiple read models in parallel
const updatePromises = [];
// 1. Update customer order history (optimized for customer queries)
updatePromises.push(
dynamoClient.send(new UpdateCommand({
TableName: process.env.READ_TABLE_NAME!,
Key: {
PK: `CUSTOMER#${detail.customerId}`,
SK: `ORDER#${detail.timestamp}#${detail.orderId}`
},
UpdateExpression: 'SET orderId = :orderId, total = :total, #items = :items, createdAt = :timestamp',
ExpressionAttributeNames: {
'#items': 'items'
},
ExpressionAttributeValues: {
':orderId': detail.orderId,
':total': detail.total,
':items': detail.items,
':timestamp': detail.timestamp
}
}))
);
// 2. Update product statistics (for popular products, bestsellers, etc.)
for (const item of detail.items) {
updatePromises.push(
dynamoClient.send(new UpdateCommand({
TableName: process.env.READ_TABLE_NAME!,
Key: {
PK: `PRODUCT#${item.productId}`,
SK: 'STATS'
},
UpdateExpression: `
ADD salesCount :quantity, revenue :revenue
SET lastSoldAt = :timestamp
`,
ExpressionAttributeValues: {
':quantity': item.quantity,
':revenue': item.price * item.quantity,
':timestamp': detail.timestamp
}
}))
);
}
// 3. Update daily sales aggregations (for dashboards)
const dateKey = new Date(detail.timestamp).toISOString().split('T')[0];
updatePromises.push(
dynamoClient.send(new UpdateCommand({
TableName: process.env.READ_TABLE_NAME!,
Key: {
PK: `SALES#${dateKey}`,
SK: 'AGGREGATE'
},
UpdateExpression: 'ADD orderCount :one, totalRevenue :total',
ExpressionAttributeValues: {
':one': 1,
':total': detail.total
}
}))
);
try {
await Promise.all(updatePromises);
} catch (error) {
console.error('Failed to update read models:', error);
// Send to DLQ for manual intervention
await sqsClient.send(new SendMessageCommand({
QueueUrl: process.env.DLQ_URL!,
MessageBody: JSON.stringify({
event: 'OrderCreated',
detail,
error: error.message,
timestamp: Date.now()
})
}));
throw error; // Let Lambda retry
}
};
Infrastructure as Code with CDK
Here’s the complete serverless CQRS setup:
// infrastructure/cqrs-stack.ts
import { Stack, StackProps, Duration, RemovalPolicy } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda-nodejs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as sqs from 'aws-cdk-lib/aws-sqs';
import { Runtime } from 'aws-cdk-lib/aws-lambda';
export class CQRSServerlessStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
// Write model table - optimized for writes
const writeTable = new dynamodb.Table(this, 'WriteTable', {
partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.ON_DEMAND, // No throttling during spikes
stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES, // For change data capture
pointInTimeRecovery: true,
removalPolicy: RemovalPolicy.RETAIN
});
// Read model table - optimized for queries
const readTable = new dynamodb.Table(this, 'ReadTable', {
partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
pointInTimeRecovery: true,
removalPolicy: RemovalPolicy.RETAIN
});
// Add GSIs for different query patterns
readTable.addGlobalSecondaryIndex({
indexName: 'CategoryIndex',
partitionKey: { name: 'category', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'popularity', type: dynamodb.AttributeType.NUMBER },
projectionType: dynamodb.ProjectionType.ALL
});
readTable.addGlobalSecondaryIndex({
indexName: 'CustomerIndex',
partitionKey: { name: 'customerId', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'createdAt', type: dynamodb.AttributeType.NUMBER },
projectionType: dynamodb.ProjectionType.ALL
});
// Event bus for CQRS events
const eventBus = new events.EventBus(this, 'CQRSEventBus', {
eventBusName: 'cqrs-events'
});
// Dead letter queue for failed events
const dlq = new sqs.Queue(this, 'EventDLQ', {
queueName: 'cqrs-event-dlq',
retentionPeriod: Duration.days(14)
});
// Command handlers
const createOrderHandler = new lambda.NodejsFunction(this, 'CreateOrderHandler', {
entry: 'src/commands/create-order.ts',
runtime: Runtime.NODEJS_22_X, // Updated to latest LTS runtime
memorySize: 1024,
timeout: Duration.seconds(10),
environment: {
WRITE_TABLE_NAME: writeTable.tableName,
EVENT_BUS_NAME: eventBus.eventBusName,
AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1'
},
bundling: {
minify: true,
target: 'node20',
externalModules: ['@aws-sdk/*']
}
});
writeTable.grantWriteData(createOrderHandler);
eventBus.grantPutEventsTo(createOrderHandler);
// Query handlers
const getProductsHandler = new lambda.NodejsFunction(this, 'GetProductsHandler', {
entry: 'src/queries/get-product-catalog.ts',
runtime: Runtime.NODEJS_22_X, // Updated to latest LTS runtime
memorySize: 512, // Read-only, needs less memory
timeout: Duration.seconds(5),
environment: {
READ_TABLE_NAME: readTable.tableName,
AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1'
}
});
readTable.grantReadData(getProductsHandler);
// Event processor for syncing read models
const syncProcessor = new lambda.NodejsFunction(this, 'SyncProcessor', {
entry: 'src/processors/sync-read-models.ts',
runtime: Runtime.NODEJS_22_X, // Updated to latest LTS runtime
memorySize: 2048, // Handles batch updates
timeout: Duration.seconds(30),
reservedConcurrentExecutions: 10, // Prevent overwhelming downstream services
environment: {
READ_TABLE_NAME: readTable.tableName,
DLQ_URL: dlq.queueUrl,
AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1'
},
deadLetterQueue: dlq,
retryAttempts: 2
});
readTable.grantWriteData(syncProcessor);
dlq.grantSendMessages(syncProcessor);
// Event rules
new events.Rule(this, 'OrderCreatedRule', {
eventBus,
eventPattern: {
source: ['orders.service'],
detailType: ['OrderCreated']
},
targets: [new targets.LambdaFunction(syncProcessor, {
retryAttempts: 2,
maxEventAge: Duration.hours(2)
})]
});
// API Gateway
const api = new apigateway.RestApi(this, 'CQRSAPI', {
restApiName: 'cqrs-api',
defaultCorsPreflightOptions: {
allowOrigins: apigateway.Cors.ALL_ORIGINS,
allowMethods: apigateway.Cors.ALL_METHODS
}
});
// Command endpoints
const orders = api.root.addResource('orders');
orders.addMethod('POST', new apigateway.LambdaIntegration(createOrderHandler));
// Query endpoints
const products = api.root.addResource('products');
products.addMethod('GET', new apigateway.LambdaIntegration(getProductsHandler));
}
}
Handling Eventual Consistency (The Hard Part)
CQRS means accepting eventual consistency. Here’s how we handle it without confusing users:
// strategies/consistency-handling.ts
export class ConsistencyStrategy {
// Strategy 1: Optimistic UI updates
async createOrderWithOptimisticUpdate(orderData: any) {
// Immediately show success to user
const tempOrderId = `temp_${Date.now()}`;
updateUI({ orderId: tempOrderId, status: 'processing' });
try {
const response = await fetch('/api/orders', {
method: 'POST',
body: JSON.stringify(orderData)
});
const { orderId } = await response.json();
// Replace temp ID with real ID
updateUI({ oldId: tempOrderId, newId: orderId, status: 'confirmed' });
// Poll for read model update
await this.waitForReadModelSync(orderId);
} catch (error) {
// Rollback optimistic update
removeFromUI(tempOrderId);
showError('Order failed');
}
}
// Strategy 2: Polling with exponential backoff
async waitForReadModelSync(orderId: string, maxAttempts = 5) {
let attempts = 0;
let delay = 100; // Start with 100ms
while (attempts < maxAttempts) {
const order = await this.checkReadModel(orderId);
if (order) {
return order;
}
await new Promise(resolve => setTimeout(resolve, delay));
delay *= 2; // Exponential backoff
attempts++;
}
// Fall back to command model query
return this.queryCommandModel(orderId);
}
// Strategy 3: WebSocket notifications
subscribeToOrderUpdates(customerId: string) {
const ws = new WebSocket(`wss://api.example.com/orders/${customerId}`);
ws.onmessage = (event) => {
const update = JSON.parse(event.data);
if (update.type === 'READ_MODEL_SYNCED') {
refreshOrderList();
}
};
}
}
Testing CQRS in Serverless
Testing distributed systems is hard. Here’s our approach:
// tests/cqrs-integration.test.ts
import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { mockClient } from 'aws-sdk-client-mock';
describe('CQRS Event Flow', () => {
const eventBridgeMock = mockClient(EventBridgeClient);
const dynamoMock = mockClient(DynamoDBClient);
beforeEach(() => {
eventBridgeMock.reset();
dynamoMock.reset();
});
test('Order creation triggers read model update', async () => {
// Arrange
const orderId = 'test-order-123';
eventBridgeMock.on(PutEventsCommand).resolves({
FailedEntryCount: 0,
Entries: [{ EventId: 'event-123' }]
});
// Act - Create order
const response = await handler({
body: JSON.stringify({
customerId: 'customer-123',
items: [{ productId: 'prod-1', quantity: 2, price: 99.99 }]
})
});
// Assert - Event was published
expect(eventBridgeMock.calls()).toHaveLength(1);
const eventCall = eventBridgeMock.call(0);
expect(eventCall.args[0].input.Entries[0].DetailType).toBe('OrderCreated');
// Simulate event processor
await syncProcessor({
detail: JSON.parse(eventCall.args[0].input.Entries[0].Detail)
});
// Assert - Read models updated
const readModelCalls = dynamoMock.calls().filter(
call => call.args[0].input.TableName === 'ReadTable'
);
expect(readModelCalls).toHaveLength(3); // Customer, Product, Daily stats
});
test('Failed event processing sends to DLQ', async () => {
// Simulate DynamoDB failure
dynamoMock.on(UpdateCommand).rejects(new Error('Throttled'));
const event = {
detail: {
orderId: 'order-123',
customerId: 'customer-123',
items: [],
total: 100,
timestamp: Date.now()
}
};
await expect(syncProcessor(event)).rejects.toThrow('Throttled');
// Verify DLQ message
const sqsCalls = sqsMock.calls();
expect(sqsCalls).toHaveLength(1);
expect(JSON.parse(sqsCalls[0].args[0].input.MessageBody))
.toHaveProperty('error', 'Throttled');
});
});
Monitoring and Debugging CQRS
The distributed nature of CQRS makes debugging challenging. Here’s our monitoring setup:
// monitoring/cqrs-metrics.ts
import { MetricUnit, Metrics } from '@aws-lambda-powertools/metrics';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { Logger } from '@aws-lambda-powertools/logger';
const metrics = new Metrics({ namespace: 'CQRS', serviceName: 'orders' });
const tracer = new Tracer({ serviceName: 'orders' });
const logger = new Logger({ serviceName: 'orders' });
export const instrumentedHandler = tracer.captureLambdaHandler(
metrics.logMetrics(
async (event: any) => {
const segment = tracer.getSegment();
// Track command/query separation
const operationType = event.httpMethod === 'GET' ? 'QUERY' : 'COMMAND';
metrics.addMetric(`${operationType}_REQUEST`, MetricUnit.Count, 1);
const startTime = Date.now();
try {
// Add correlation ID for tracing across services
const correlationId = event.headers['x-correlation-id'] || ulid();
segment?.addAnnotation('correlationId', correlationId);
logger.appendKeys({ correlationId });
// Track read/write model sync lag
if (operationType === 'QUERY') {
const syncLag = await measureSyncLag();
metrics.addMetric('READ_MODEL_LAG_MS', MetricUnit.Milliseconds, syncLag);
if (syncLag > 5000) {
logger.warn('High read model lag detected', { syncLag });
}
}
const result = await processRequest(event);
metrics.addMetric(`${operationType}_SUCCESS`, MetricUnit.Count, 1);
metrics.addMetric(`${operationType}_DURATION`, MetricUnit.Milliseconds,
Date.now() - startTime);
return result;
} catch (error) {
metrics.addMetric(`${operationType}_ERROR`, MetricUnit.Count, 1);
logger.error('Request failed', { error, event });
throw error;
}
}
)
);
// Custom CloudWatch dashboard
export const dashboardConfig = {
widgets: [
{
type: 'metric',
properties: {
metrics: [
['CQRS', 'COMMAND_REQUEST', { stat: 'Sum' }],
['.', 'QUERY_REQUEST', { stat: 'Sum' }],
['.', 'READ_MODEL_LAG_MS', { stat: 'Average' }]
],
period: 300,
stat: 'Average',
region: 'us-east-1',
title: 'CQRS Operations'
}
}
]
};
Cost Analysis: The 70% Reduction
Here’s the actual cost breakdown from our production system:
Before CQRS (March 2024):
- DynamoDB Provisioned: $2,100/month (provisioned for peak)
- Lambda Compute: $450/month (complex queries, high memory)
- API Gateway: $180/month
- Total: $2,730/month
After CQRS (June 2024):
- DynamoDB On-Demand (Write): $180/month
- DynamoDB On-Demand (Read): $320/month
- EventBridge: $12/month
- Lambda Compute: $180/month (simpler, focused functions)
- API Gateway: $180/month
- Total: $972/month (64% reduction)
The real savings came from:
- No over-provisioning for peak loads
- Cached read models reducing database hits
- Simpler Lambda functions using less memory
- Better query optimization with purpose-built indexes
Lessons Learned (The Hard Way)
1. Start Simple, Really Simple
Early CQRS implementations often start with too many read models. Experience shows that starting with 3 or fewer is better. Begin with one read model and add more only when query patterns demand it.
2. Event Versioning is Critical
We didn’t version our events initially. When we needed to add a field to OrderCreated, we broke every consumer. Now:
interface OrderCreatedV1 {
version: 1;
orderId: string;
customerId: string;
total: number;
}
interface OrderCreatedV2 {
version: 2;
orderId: string;
customerId: string;
total: number;
currency: string; // New field
}
// Handler supports both versions
export const handler = async (event: OrderCreatedV1 | OrderCreatedV2) => {
const currency = 'version' in event && event.version >= 2
? (event as OrderCreatedV2).currency
: 'USD'; // Default for V1
};
3. Idempotency Everywhere
Events can be delivered multiple times. Every handler must be idempotent:
// Use conditional writes to ensure idempotency
await dynamoClient.send(new PutCommand({
TableName: TABLE_NAME,
Item: processedEvent,
ConditionExpression: 'attribute_not_exists(eventId)'
}));
4. Monitor the Sync Lag
The time between command execution and read model update is your most important metric. We alert if it exceeds 5 seconds.
5. Plan for Reconciliation
Read models will drift. We run a nightly job that compares command and query models, fixing discrepancies:
// Runs during low-traffic periods daily
export const reconciliationJob = async () => {
const commandRecords = await scanCommandTable();
const readRecords = await scanReadTable();
const discrepancies = findDiscrepancies(commandRecords, readRecords);
for (const issue of discrepancies) {
await republishEvent(issue.originalEvent);
logger.warn('Reconciliation required', { issue });
}
metrics.addMetric('RECONCILIATION_FIXES', MetricUnit.Count, discrepancies.length);
};
When NOT to Use CQRS
CQRS added complexity to our system. It was worth it for us, but avoid it if:
- Your read/write patterns are similar
- You have simple CRUD operations
- Strong consistency is required everywhere
- Your team isn’t comfortable with eventual consistency
- You’re not experiencing performance issues
We tried CQRS on our admin panel (50 users, simple CRUD). It was a failure: too much complexity for no benefit.
The Debugging Horror Story
Two weeks after deploying CQRS, customers reported seeing old order statuses. The issue? The event processor was failing silently for certain product categories. The Lambda was timing out, but the Dead Letter Queue (DLQ) wasn’t configured with proper error handling and alerting mechanisms.
The debugging process revealed several critical configuration issues:
- DLQ visibility timeout was too short for processing retry attempts
- No CloudWatch alarms configured for DLQ message arrival
- Missing exponential backoff in the event processor retry logic
- No fallback mechanism when read models were inconsistent
This debugging experience taught me to:
- Always configure DLQs with CloudWatch alerts and proper visibility timeouts
- Add circuit breakers to event processors with exponential backoff
- Implement read-after-write consistency checks for critical user-facing operations
- Keep a “fallback to command model” option for when read models lag behind
Moving Forward
CQRS with serverless works beautifully when you need it. The combination of Lambda’s auto-scaling, EventBridge’s routing, and DynamoDB’s flexible schemas makes implementation straightforward.
But remember: CQRS is a solution to specific problems - high read/write disparity, complex query requirements, or scalability issues. If you don’t have these problems, you don’t need CQRS.
Start with a monolith, measure your bottlenecks, and only then consider CQRS. When you do implement it, start with one read model and grow from there.
The cost reduction was significant, but the real validation came during Black Friday 2024: zero downtime, consistent low latency, and smooth customer experience. That’s when the architectural decision proved its worth.
Related posts
Learn how to build a comprehensive testing strategy for AWS Lambda, API Gateway, DynamoDB, and Step Functions with practical patterns for fast feedback and production reliability.
Setting up a production-grade link shortener with AWS CDK, DynamoDB, and Lambda. Real architecture decisions, initial setup, and lessons learned from building URL shorteners at scale.
Achieve sub-10ms response times in AWS Lambda through runtime selection, database optimization, bundle size reduction, and caching strategies. Real benchmarks and production lessons included.
A practical guide to implementing customer relationship management using event sourcing, CQRS, and event-driven patterns for marketing automation and consent management
Multi-environment deployment strategies, performance optimization at scale, and cost management. Production insights and lessons learned with proper monitoring and incident response patterns.