2025-09-04

CQRS with Serverless: How I Cut DynamoDB Costs by 70% and Improved Performance

Real-world CQRS implementation with AWS Lambda, EventBridge, and DynamoDB. Learn from my mistakes implementing event sourcing, handling eventual consistency, and debugging distributed systems in production.

What is CQRS and Why Should You Care?

CQRS (Command Query Responsibility Segregation) is an architectural pattern that separates your write operations (commands) from your read operations (queries). Instead of using the same model for both reading and writing data, you optimize each side for its specific purpose.

The Core Principle

In traditional architectures, you typically use the same data model for both reading and writing:

// Traditional approach - same model for everything
class OrderService {
  async createOrder(orderData) {
    // Write to the same table
    return await db.orders.insert(orderData);
  }

  async getOrderHistory(customerId) {
    // Read from the same table with complex joins
    return await db.orders.find()
      .join('customers')
      .join('products')
      .where('customerId', customerId);
  }
}

With CQRS, you split this into two optimized models:

// CQRS approach - separate optimized models
class OrderCommandService {
  async createOrder(orderData) {
    // Write-optimized: Simple, fast inserts
    await writeDb.orders.insert(orderData);
    // Publish event for read model updates
    await eventBus.publish('OrderCreated', orderData);
  }
}

class OrderQueryService {
  async getOrderHistory(customerId) {
    // Read-optimized: Pre-computed, denormalized data
    return await readDb.customerOrderHistory.find(customerId);
  }
}

Why CQRS Solves Real Problems

CQRS isn’t just theoretical - it addresses specific, measurable problems:

Performance Mismatch: Writes need validation and consistency, reads need speed
Scale Mismatch: Most systems have 10:1 or 100:1 read-to-write ratios
Model Complexity: Optimizing for writes makes reads complex, and vice versa
Team Parallelization: Different teams can work on read and write sides independently

When CQRS Makes Sense

Use CQRS when you have:

High read-to-write ratios (10:1 or higher)
Different performance requirements for reads vs writes
Complex reporting or analytics needs
Need to scale reads and writes independently
Multiple data representation needs (APIs, reports, dashboards)

Avoid CQRS when you have:

Simple CRUD applications
Low traffic applications
Strong consistency requirements everywhere
Small team that can’t handle the complexity
Similar read and write patterns

The Real-World Impact

Working with an e-commerce platform, I discovered how DynamoDB throttling errors during flash sales taught me about competing read and write operations. Implementing CQRS revealed cost reduction opportunities and performance improvements that eliminated throttling errors during high-traffic events.

But here’s the key insight: CQRS isn’t about the tools you use, it’s about recognizing when your read and write needs are fundamentally different.

The Problem That Led Me to CQRS

Let me show you exactly why CQRS became necessary with a real example. A monolithic Lambda function was handling everything - product catalog reads, order processing, inventory updates. During a flash sale, several issues emerged:

DynamoDB throttling: High write operations from orders competing with read operations from browsing users
Lambda timeouts: Complex aggregation queries taking significant time
Cost challenges: Provisioned capacity needed only during peak periods
Data inconsistency: Inventory counts affected by concurrent updates

The worst part? Product detail pages (the majority of traffic) were slow because they shared the same data model optimized for order processing.

This is the classic scenario where CQRS shines: when your read and write workloads have completely different characteristics and requirements.

Our Architecture Evolution

Before CQRS (The Monolith):

// Single Lambda handling everything - seemed simple at first
export const handler = async (event: APIGatewayEvent) => {
  const { httpMethod, path } = event;

  if (httpMethod === 'GET' && path === '/products') {
    // Complex query joining 3 tables
    const products = await dynamoClient.query({
      TableName: 'MainTable',
      IndexName: 'GSI1',
      KeyConditionExpression: 'GSI1PK = :pk',
      ExpressionAttributeValues: { ':pk': 'PRODUCT' }
    }).promise();

    // Then fetch inventory for each product (N+1 query problem)
    for (const product of products.Items) {
      const inventory = await getInventory(product.id);
      product.availableQuantity = inventory.quantity;
    }

    return { statusCode: 200, body: JSON.stringify(products) };
  }

  if (httpMethod === 'POST' && path === '/orders') {
    // Write to same table, competing for throughput
    await createOrder(JSON.parse(event.body));
  }
};

After CQRS (Separated Concerns):

The Command Side: Handling Writes

// commands/create-order.ts - Focused solely on order processing
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, PutCommand } from '@aws-sdk/lib-dynamodb';
import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
import { z } from 'zod';
import { ulid } from 'ulid';

const dynamoClient = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const eventBridge = new EventBridgeClient({});

// Input validation with Zod - caught so many bugs in production
const CreateOrderSchema = z.object({
  customerId: z.string().uuid(),
  items: z.array(z.object({
    productId: z.string(),
    quantity: z.number().positive(),
    price: z.number().positive()
  })).min(1),
  shippingAddress: z.object({
    street: z.string(),
    city: z.string(),
    country: z.string(),
    postalCode: z.string()
  })
});

export const handler = async (event: any) => {
  // Parse and validate input
  const input = CreateOrderSchema.parse(JSON.parse(event.body));

  const orderId = ulid(); // Time-sortable IDs - game changer for debugging
  const timestamp = Date.now();

  // Write to command store (write-optimized table)
  const order = {
    PK: `ORDER#${orderId}`,
    SK: `ORDER#${orderId}`,
    id: orderId,
    customerId: input.customerId,
    items: input.items,
    total: input.items.reduce((sum, item) => sum + (item.price * item.quantity), 0),
    status: 'PENDING',
    createdAt: timestamp,
    updatedAt: timestamp,
    version: 1 // Optimistic locking - saved us from race conditions
  };

  try {
    await dynamoClient.send(new PutCommand({
      TableName: process.env.WRITE_TABLE_NAME!,
      Item: order,
      ConditionExpression: 'attribute_not_exists(PK)' // Prevent duplicates
    }));

    // Publish event for read model updates
    await eventBridge.send(new PutEventsCommand({
      Entries: [{
        Source: 'orders.service',
        DetailType: 'OrderCreated',
        Detail: JSON.stringify({
          orderId,
          customerId: input.customerId,
          items: input.items,
          total: order.total,
          timestamp
        }),
        EventBusName: process.env.EVENT_BUS_NAME
      }]
    }));

    return {
      statusCode: 201,
      body: JSON.stringify({ orderId, status: 'CREATED' })
    };

  } catch (error) {
    console.error('Order creation failed:', error);
    // Implement proper error handling and compensation
    throw error;
  }
};

The Query Side: Optimized Reads

// queries/get-product-catalog.ts - Read-optimized for performance
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, GetCommand, QueryCommand } from '@aws-sdk/lib-dynamodb';

const dynamoClient = DynamoDBDocumentClient.from(new DynamoDBClient({}));

// Pre-computed, denormalized data for fast reads
export const handler = async (event: any) => {
  const { category, limit = 20, lastKey } = event.queryStringParameters || {};

  // Read from read-optimized table with pre-computed aggregations
  const response = await dynamoClient.send(new QueryCommand({
    TableName: process.env.READ_TABLE_NAME!,
    IndexName: 'CategoryIndex',
    KeyConditionExpression: 'category = :category',
    ExpressionAttributeValues: {
      ':category': category || 'ALL'
    },
    Limit: parseInt(limit),
    ExclusiveStartKey: lastKey ? JSON.parse(Buffer.from(lastKey, 'base64').toString()) : undefined,
    // Only fetch what we need for listing
    ProjectionExpression: 'id, #n, price, imageUrl, averageRating, reviewCount, inStock',
    ExpressionAttributeNames: {
      '#n': 'name' // 'name' is a reserved word in DynamoDB
    }
  }));

  return {
    statusCode: 200,
    headers: {
      'Cache-Control': 'public, max-age=300', // 5-minute cache for product listings
    },
    body: JSON.stringify({
      products: response.Items,
      nextKey: response.LastEvaluatedKey
        ? Buffer.from(JSON.stringify(response.LastEvaluatedKey)).toString('base64')
        : null
    })
  };
};

The Event Processor: Keeping Models in Sync

This is where the magic happens - and where most CQRS implementations fail:

// processors/sync-read-models.ts - The critical synchronization layer
import { EventBridgeEvent } from 'aws-lambda';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient, UpdateCommand, BatchWriteCommand } from '@aws-sdk/lib-dynamodb';
import { SQSClient, SendMessageCommand } from '@aws-sdk/client-sqs';

const dynamoClient = DynamoDBDocumentClient.from(new DynamoDBClient({}));
const sqsClient = new SQSClient({});

interface OrderCreatedEvent {
  orderId: string;
  customerId: string;
  items: Array<{ productId: string; quantity: number; price: number }>;
  total: number;
  timestamp: number;
}

export const handler = async (event: EventBridgeEvent<'OrderCreated', OrderCreatedEvent>) => {
  const { detail } = event;

  // Update multiple read models in parallel
  const updatePromises = [];

  // 1. Update customer order history (optimized for customer queries)
  updatePromises.push(
    dynamoClient.send(new UpdateCommand({
      TableName: process.env.READ_TABLE_NAME!,
      Key: {
        PK: `CUSTOMER#${detail.customerId}`,
        SK: `ORDER#${detail.timestamp}#${detail.orderId}`
      },
      UpdateExpression: 'SET orderId = :orderId, total = :total, #items = :items, createdAt = :timestamp',
      ExpressionAttributeNames: {
        '#items': 'items'
      },
      ExpressionAttributeValues: {
        ':orderId': detail.orderId,
        ':total': detail.total,
        ':items': detail.items,
        ':timestamp': detail.timestamp
      }
    }))
  );

  // 2. Update product statistics (for popular products, bestsellers, etc.)
  for (const item of detail.items) {
    updatePromises.push(
      dynamoClient.send(new UpdateCommand({
        TableName: process.env.READ_TABLE_NAME!,
        Key: {
          PK: `PRODUCT#${item.productId}`,
          SK: 'STATS'
        },
        UpdateExpression: `
          ADD salesCount :quantity, revenue :revenue
          SET lastSoldAt = :timestamp
        `,
        ExpressionAttributeValues: {
          ':quantity': item.quantity,
          ':revenue': item.price * item.quantity,
          ':timestamp': detail.timestamp
        }
      }))
    );
  }

  // 3. Update daily sales aggregations (for dashboards)
  const dateKey = new Date(detail.timestamp).toISOString().split('T')[0];
  updatePromises.push(
    dynamoClient.send(new UpdateCommand({
      TableName: process.env.READ_TABLE_NAME!,
      Key: {
        PK: `SALES#${dateKey}`,
        SK: 'AGGREGATE'
      },
      UpdateExpression: 'ADD orderCount :one, totalRevenue :total',
      ExpressionAttributeValues: {
        ':one': 1,
        ':total': detail.total
      }
    }))
  );

  try {
    await Promise.all(updatePromises);
  } catch (error) {
    console.error('Failed to update read models:', error);

    // Send to DLQ for manual intervention
    await sqsClient.send(new SendMessageCommand({
      QueueUrl: process.env.DLQ_URL!,
      MessageBody: JSON.stringify({
        event: 'OrderCreated',
        detail,
        error: error.message,
        timestamp: Date.now()
      })
    }));

    throw error; // Let Lambda retry
  }
};

Infrastructure as Code with CDK

Here’s the complete serverless CQRS setup:

// infrastructure/cqrs-stack.ts
import { Stack, StackProps, Duration, RemovalPolicy } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda-nodejs';
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as sqs from 'aws-cdk-lib/aws-sqs';
import { Runtime } from 'aws-cdk-lib/aws-lambda';

export class CQRSServerlessStack extends Stack {
  constructor(scope: Construct, id: string, props?: StackProps) {
    super(scope, id, props);

    // Write model table - optimized for writes
    const writeTable = new dynamodb.Table(this, 'WriteTable', {
      partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.ON_DEMAND, // No throttling during spikes
      stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES, // For change data capture
      pointInTimeRecovery: true,
      removalPolicy: RemovalPolicy.RETAIN
    });

    // Read model table - optimized for queries
    const readTable = new dynamodb.Table(this, 'ReadTable', {
      partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
      billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
      pointInTimeRecovery: true,
      removalPolicy: RemovalPolicy.RETAIN
    });

    // Add GSIs for different query patterns
    readTable.addGlobalSecondaryIndex({
      indexName: 'CategoryIndex',
      partitionKey: { name: 'category', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'popularity', type: dynamodb.AttributeType.NUMBER },
      projectionType: dynamodb.ProjectionType.ALL
    });

    readTable.addGlobalSecondaryIndex({
      indexName: 'CustomerIndex',
      partitionKey: { name: 'customerId', type: dynamodb.AttributeType.STRING },
      sortKey: { name: 'createdAt', type: dynamodb.AttributeType.NUMBER },
      projectionType: dynamodb.ProjectionType.ALL
    });

    // Event bus for CQRS events
    const eventBus = new events.EventBus(this, 'CQRSEventBus', {
      eventBusName: 'cqrs-events'
    });

    // Dead letter queue for failed events
    const dlq = new sqs.Queue(this, 'EventDLQ', {
      queueName: 'cqrs-event-dlq',
      retentionPeriod: Duration.days(14)
    });

    // Command handlers
    const createOrderHandler = new lambda.NodejsFunction(this, 'CreateOrderHandler', {
      entry: 'src/commands/create-order.ts',
      runtime: Runtime.NODEJS_22_X, // Updated to latest LTS runtime
      memorySize: 1024,
      timeout: Duration.seconds(10),
      environment: {
        WRITE_TABLE_NAME: writeTable.tableName,
        EVENT_BUS_NAME: eventBus.eventBusName,
        AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1'
      },
      bundling: {
        minify: true,
        target: 'node20',
        externalModules: ['@aws-sdk/*']
      }
    });

    writeTable.grantWriteData(createOrderHandler);
    eventBus.grantPutEventsTo(createOrderHandler);

    // Query handlers
    const getProductsHandler = new lambda.NodejsFunction(this, 'GetProductsHandler', {
      entry: 'src/queries/get-product-catalog.ts',
      runtime: Runtime.NODEJS_22_X, // Updated to latest LTS runtime
      memorySize: 512, // Read-only, needs less memory
      timeout: Duration.seconds(5),
      environment: {
        READ_TABLE_NAME: readTable.tableName,
        AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1'
      }
    });

    readTable.grantReadData(getProductsHandler);

    // Event processor for syncing read models
    const syncProcessor = new lambda.NodejsFunction(this, 'SyncProcessor', {
      entry: 'src/processors/sync-read-models.ts',
      runtime: Runtime.NODEJS_22_X, // Updated to latest LTS runtime
      memorySize: 2048, // Handles batch updates
      timeout: Duration.seconds(30),
      reservedConcurrentExecutions: 10, // Prevent overwhelming downstream services
      environment: {
        READ_TABLE_NAME: readTable.tableName,
        DLQ_URL: dlq.queueUrl,
        AWS_NODEJS_CONNECTION_REUSE_ENABLED: '1'
      },
      deadLetterQueue: dlq,
      retryAttempts: 2
    });

    readTable.grantWriteData(syncProcessor);
    dlq.grantSendMessages(syncProcessor);

    // Event rules
    new events.Rule(this, 'OrderCreatedRule', {
      eventBus,
      eventPattern: {
        source: ['orders.service'],
        detailType: ['OrderCreated']
      },
      targets: [new targets.LambdaFunction(syncProcessor, {
        retryAttempts: 2,
        maxEventAge: Duration.hours(2)
      })]
    });

    // API Gateway
    const api = new apigateway.RestApi(this, 'CQRSAPI', {
      restApiName: 'cqrs-api',
      defaultCorsPreflightOptions: {
        allowOrigins: apigateway.Cors.ALL_ORIGINS,
        allowMethods: apigateway.Cors.ALL_METHODS
      }
    });

    // Command endpoints
    const orders = api.root.addResource('orders');
    orders.addMethod('POST', new apigateway.LambdaIntegration(createOrderHandler));

    // Query endpoints
    const products = api.root.addResource('products');
    products.addMethod('GET', new apigateway.LambdaIntegration(getProductsHandler));
  }
}

Handling Eventual Consistency (The Hard Part)

CQRS means accepting eventual consistency. Here’s how we handle it without confusing users:

// strategies/consistency-handling.ts
export class ConsistencyStrategy {
  // Strategy 1: Optimistic UI updates
  async createOrderWithOptimisticUpdate(orderData: any) {
    // Immediately show success to user
    const tempOrderId = `temp_${Date.now()}`;
    updateUI({ orderId: tempOrderId, status: 'processing' });

    try {
      const response = await fetch('/api/orders', {
        method: 'POST',
        body: JSON.stringify(orderData)
      });

      const { orderId } = await response.json();

      // Replace temp ID with real ID
      updateUI({ oldId: tempOrderId, newId: orderId, status: 'confirmed' });

      // Poll for read model update
      await this.waitForReadModelSync(orderId);

    } catch (error) {
      // Rollback optimistic update
      removeFromUI(tempOrderId);
      showError('Order failed');
    }
  }

  // Strategy 2: Polling with exponential backoff
  async waitForReadModelSync(orderId: string, maxAttempts = 5) {
    let attempts = 0;
    let delay = 100; // Start with 100ms

    while (attempts < maxAttempts) {
      const order = await this.checkReadModel(orderId);

      if (order) {
        return order;
      }

      await new Promise(resolve => setTimeout(resolve, delay));
      delay *= 2; // Exponential backoff
      attempts++;
    }

    // Fall back to command model query
    return this.queryCommandModel(orderId);
  }

  // Strategy 3: WebSocket notifications
  subscribeToOrderUpdates(customerId: string) {
    const ws = new WebSocket(`wss://api.example.com/orders/${customerId}`);

    ws.onmessage = (event) => {
      const update = JSON.parse(event.data);
      if (update.type === 'READ_MODEL_SYNCED') {
        refreshOrderList();
      }
    };
  }
}

Testing CQRS in Serverless

Testing distributed systems is hard. Here’s our approach:

// tests/cqrs-integration.test.ts
import { EventBridgeClient, PutEventsCommand } from '@aws-sdk/client-eventbridge';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { mockClient } from 'aws-sdk-client-mock';

describe('CQRS Event Flow', () => {
  const eventBridgeMock = mockClient(EventBridgeClient);
  const dynamoMock = mockClient(DynamoDBClient);

  beforeEach(() => {
    eventBridgeMock.reset();
    dynamoMock.reset();
  });

  test('Order creation triggers read model update', async () => {
    // Arrange
    const orderId = 'test-order-123';
    eventBridgeMock.on(PutEventsCommand).resolves({
      FailedEntryCount: 0,
      Entries: [{ EventId: 'event-123' }]
    });

    // Act - Create order
    const response = await handler({
      body: JSON.stringify({
        customerId: 'customer-123',
        items: [{ productId: 'prod-1', quantity: 2, price: 99.99 }]
      })
    });

    // Assert - Event was published
    expect(eventBridgeMock.calls()).toHaveLength(1);
    const eventCall = eventBridgeMock.call(0);
    expect(eventCall.args[0].input.Entries[0].DetailType).toBe('OrderCreated');

    // Simulate event processor
    await syncProcessor({
      detail: JSON.parse(eventCall.args[0].input.Entries[0].Detail)
    });

    // Assert - Read models updated
    const readModelCalls = dynamoMock.calls().filter(
      call => call.args[0].input.TableName === 'ReadTable'
    );
    expect(readModelCalls).toHaveLength(3); // Customer, Product, Daily stats
  });

  test('Failed event processing sends to DLQ', async () => {
    // Simulate DynamoDB failure
    dynamoMock.on(UpdateCommand).rejects(new Error('Throttled'));

    const event = {
      detail: {
        orderId: 'order-123',
        customerId: 'customer-123',
        items: [],
        total: 100,
        timestamp: Date.now()
      }
    };

    await expect(syncProcessor(event)).rejects.toThrow('Throttled');

    // Verify DLQ message
    const sqsCalls = sqsMock.calls();
    expect(sqsCalls).toHaveLength(1);
    expect(JSON.parse(sqsCalls[0].args[0].input.MessageBody))
      .toHaveProperty('error', 'Throttled');
  });
});

Monitoring and Debugging CQRS

The distributed nature of CQRS makes debugging challenging. Here’s our monitoring setup:

// monitoring/cqrs-metrics.ts
import { MetricUnit, Metrics } from '@aws-lambda-powertools/metrics';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { Logger } from '@aws-lambda-powertools/logger';

const metrics = new Metrics({ namespace: 'CQRS', serviceName: 'orders' });
const tracer = new Tracer({ serviceName: 'orders' });
const logger = new Logger({ serviceName: 'orders' });

export const instrumentedHandler = tracer.captureLambdaHandler(
  metrics.logMetrics(
    async (event: any) => {
      const segment = tracer.getSegment();

      // Track command/query separation
      const operationType = event.httpMethod === 'GET' ? 'QUERY' : 'COMMAND';
      metrics.addMetric(`${operationType}_REQUEST`, MetricUnit.Count, 1);

      const startTime = Date.now();

      try {
        // Add correlation ID for tracing across services
        const correlationId = event.headers['x-correlation-id'] || ulid();
        segment?.addAnnotation('correlationId', correlationId);
        logger.appendKeys({ correlationId });

        // Track read/write model sync lag
        if (operationType === 'QUERY') {
          const syncLag = await measureSyncLag();
          metrics.addMetric('READ_MODEL_LAG_MS', MetricUnit.Milliseconds, syncLag);

          if (syncLag > 5000) {
            logger.warn('High read model lag detected', { syncLag });
          }
        }

        const result = await processRequest(event);

        metrics.addMetric(`${operationType}_SUCCESS`, MetricUnit.Count, 1);
        metrics.addMetric(`${operationType}_DURATION`, MetricUnit.Milliseconds,
          Date.now() - startTime);

        return result;

      } catch (error) {
        metrics.addMetric(`${operationType}_ERROR`, MetricUnit.Count, 1);
        logger.error('Request failed', { error, event });
        throw error;
      }
    }
  )
);

// Custom CloudWatch dashboard
export const dashboardConfig = {
  widgets: [
    {
      type: 'metric',
      properties: {
        metrics: [
          ['CQRS', 'COMMAND_REQUEST', { stat: 'Sum' }],
          ['.', 'QUERY_REQUEST', { stat: 'Sum' }],
          ['.', 'READ_MODEL_LAG_MS', { stat: 'Average' }]
        ],
        period: 300,
        stat: 'Average',
        region: 'us-east-1',
        title: 'CQRS Operations'
      }
    }
  ]
};

Cost Analysis: The 70% Reduction

Here’s the actual cost breakdown from our production system:

Before CQRS (March 2024):

DynamoDB Provisioned: $2,100/month (provisioned for peak)
Lambda Compute: $450/month (complex queries, high memory)
API Gateway: $180/month
Total: $2,730/month

After CQRS (June 2024):

DynamoDB On-Demand (Write): $180/month
DynamoDB On-Demand (Read): $320/month
EventBridge: $12/month
Lambda Compute: $180/month (simpler, focused functions)
API Gateway: $180/month
Total: $972/month (64% reduction)

The real savings came from:

No over-provisioning for peak loads
Cached read models reducing database hits
Simpler Lambda functions using less memory
Better query optimization with purpose-built indexes

Lessons Learned (The Hard Way)

1. Start Simple, Really Simple

Early CQRS implementations often start with too many read models. Experience shows that starting with 3 or fewer is better. Begin with one read model and add more only when query patterns demand it.

2. Event Versioning is Critical

We didn’t version our events initially. When we needed to add a field to OrderCreated, we broke every consumer. Now:

interface OrderCreatedV1 {
  version: 1;
  orderId: string;
  customerId: string;
  total: number;
}

interface OrderCreatedV2 {
  version: 2;
  orderId: string;
  customerId: string;
  total: number;
  currency: string; // New field
}

// Handler supports both versions
export const handler = async (event: OrderCreatedV1 | OrderCreatedV2) => {
  const currency = 'version' in event && event.version >= 2
    ? (event as OrderCreatedV2).currency
    : 'USD'; // Default for V1
};

3. Idempotency Everywhere

Events can be delivered multiple times. Every handler must be idempotent:

// Use conditional writes to ensure idempotency
await dynamoClient.send(new PutCommand({
  TableName: TABLE_NAME,
  Item: processedEvent,
  ConditionExpression: 'attribute_not_exists(eventId)'
}));

4. Monitor the Sync Lag

The time between command execution and read model update is your most important metric. We alert if it exceeds 5 seconds.

5. Plan for Reconciliation

Read models will drift. We run a nightly job that compares command and query models, fixing discrepancies:

// Runs during low-traffic periods daily
export const reconciliationJob = async () => {
  const commandRecords = await scanCommandTable();
  const readRecords = await scanReadTable();

  const discrepancies = findDiscrepancies(commandRecords, readRecords);

  for (const issue of discrepancies) {
    await republishEvent(issue.originalEvent);
    logger.warn('Reconciliation required', { issue });
  }

  metrics.addMetric('RECONCILIATION_FIXES', MetricUnit.Count, discrepancies.length);
};

When NOT to Use CQRS

CQRS added complexity to our system. It was worth it for us, but avoid it if:

Your read/write patterns are similar
You have simple CRUD operations
Strong consistency is required everywhere
Your team isn’t comfortable with eventual consistency
You’re not experiencing performance issues

We tried CQRS on our admin panel (50 users, simple CRUD). It was a failure: too much complexity for no benefit.

The Debugging Horror Story

Two weeks after deploying CQRS, customers reported seeing old order statuses. The issue? The event processor was failing silently for certain product categories. The Lambda was timing out, but the Dead Letter Queue (DLQ) wasn’t configured with proper error handling and alerting mechanisms.

The debugging process revealed several critical configuration issues:

DLQ visibility timeout was too short for processing retry attempts
No CloudWatch alarms configured for DLQ message arrival
Missing exponential backoff in the event processor retry logic
No fallback mechanism when read models were inconsistent

This debugging experience taught me to:

Always configure DLQs with CloudWatch alerts and proper visibility timeouts
Add circuit breakers to event processors with exponential backoff
Implement read-after-write consistency checks for critical user-facing operations
Keep a “fallback to command model” option for when read models lag behind

Moving Forward

CQRS with serverless works beautifully when you need it. The combination of Lambda’s auto-scaling, EventBridge’s routing, and DynamoDB’s flexible schemas makes implementation straightforward.

But remember: CQRS is a solution to specific problems - high read/write disparity, complex query requirements, or scalability issues. If you don’t have these problems, you don’t need CQRS.

Start with a monolith, measure your bottlenecks, and only then consider CQRS. When you do implement it, start with one read model and grow from there.

The cost reduction was significant, but the real validation came during Black Friday 2024: zero downtime, consistent low latency, and smooth customer experience. That’s when the architectural decision proved its worth.

Testing Serverless Applications: A Practical Strategy Guide

Learn how to build a comprehensive testing strategy for AWS Lambda, API Gateway, DynamoDB, and Step Functions with practical patterns for fast feedback and production reliability.

lambdatestingserverless+11

December 6, 2025

AWS CDK Link Shortener Part 1: Project Setup & Basic Infrastructure

Setting up a production-grade link shortener with AWS CDK, DynamoDB, and Lambda. Real architecture decisions, initial setup, and lessons learned from building URL shorteners at scale.

aws-cdklambdadynamodb+6

September 4, 2025

AWS Lambda Sub-10ms Optimization: A Complete Guide

Achieve sub-10ms response times in AWS Lambda through runtime selection, database optimization, bundle size reduction, and caching strategies. Real benchmarks and production lessons included.

awslambdaperformance+7