2026-01-28

DynamoDB Rate Limiting: Strategies for Single Table Design at Scale

Practical strategies to prevent and handle DynamoDB throttling in Single Table Design applications. Covers partition key design, write sharding, capacity modes, DAX caching, retry patterns, and CloudWatch monitoring for high-throughput systems.

When working with DynamoDB at scale, throttling becomes an inevitable challenge. The ProvisionedThroughputExceededException error often appears despite having adequate table-level capacity, and understanding why requires diving into DynamoDB’s internal mechanics.

This guide covers proven patterns for preventing and handling throttling in Single Table Design applications, from partition key strategies to monitoring configurations that catch issues before they impact users.

Understanding DynamoDB’s Throttling Mechanism

DynamoDB uses a token bucket algorithm for rate limiting. Each partition maintains its own bucket of read and write tokens that refill at a rate matching provisioned capacity. When tokens are depleted, requests get throttled.

The critical limits to remember:

Resource	Limit
Read Capacity per Partition	3,000 RCU
Write Capacity per Partition	1,000 WCU
Storage per Partition	10 GB
Item Size	400 KB (hard limit)

Here’s what makes this tricky: provisioned capacity is distributed across partitions. A table with 100 RCU and 3 partitions means each partition gets roughly 33 RCU. If one partition receives 80% of traffic, it will throttle even though the table has headroom.

// Conceptual model: How capacity gets distributed
interface PartitionCapacity {
  // Table-level settings
  tableRCU: 100;
  tableWCU: 50;
  partitionCount: 3;

  // Per-partition reality
  perPartitionRCU: 33;  // ~100/3
  perPartitionWCU: 17;  // ~50/3

  // The problem: uneven traffic
  actualTraffic: {
    partition1: { rcu: 80 };  // 80 > 33 = THROTTLED
    partition2: { rcu: 10 };  // Underutilized
    partition3: { rcu: 10 };  // Underutilized
  };
}

Partition Key Design: The Foundation

Hot partitions cause most throttling issues. Getting partition key design right prevents problems that no amount of capacity can solve.

Anti-Patterns to Avoid

// ANTI-PATTERN 1: Low cardinality partition key
interface BadDesign1 {
  PK: 'STATUS#active' | 'STATUS#inactive';  // Only 2 values
  SK: `USER#${string}`;
}
// Result: All active users in one partition
// With 100,000 active users: immediate throttling

// ANTI-PATTERN 2: Time-based partition key
interface BadDesign2 {
  PK: `DATE#${string}`;  // e.g., "DATE#2024-01-15"
  SK: `EVENT#${string}`;
}
// Result: All today's events hit one partition
// Peak hours create hot partition

// ANTI-PATTERN 3: Celebrity/Viral content problem
interface BadDesign3 {
  PK: `POST#${string}`;  // Viral post ID
  SK: `LIKE#${string}`;
}
// Result: Viral post with millions of likes
// Single partition cannot handle the load

// ANTI-PATTERN 4: Large tenant dominance
interface BadDesign4 {
  PK: `TENANT#${string}`;
  SK: `ORDER#${string}`;
}
// Result: Enterprise tenant with 80% of orders
// Their partition is always hot

High-Cardinality Patterns That Work

// PATTERN 1: User-scoped partition keys
interface GoodDesign1 {
  PK: `USER#${userId}`;  // Unique per user
  SK: `ORDER#${timestamp}#${orderId}`;
}
// Result: Millions of unique partition keys
// Traffic naturally distributed

// PATTERN 2: Composite keys for multi-tenant
interface GoodDesign2 {
  PK: `TENANT#${tenantId}#USER#${userId}`;
  SK: string;
}
// Result: Even distribution within and across tenants
// Large tenant's users still spread across partitions

// PATTERN 3: Hierarchical with high cardinality at PK level
interface GoodDesign3 {
  PK: `REGION#${region}#STORE#${storeId}`;
  SK: `PRODUCT#${category}#${productId}`;
}
// Result: Queries scoped to store level
// Each store has its own partition space

// PATTERN 4: GSI for low-cardinality queries
interface GoodDesign4 {
  PK: `USER#${userId}`;
  SK: 'METADATA';
  status: 'active' | 'inactive';
  GSI1PK: `STATUS#${status}#SHARD#${shardId}`;  // Sharded!
  GSI1SK: `USER#${userId}`;
}
// Base table: High cardinality (users)
// GSI: Handles status queries with sharding

Write Sharding: Distributing Hot Keys

When business requirements force low-cardinality access patterns, write sharding distributes load across multiple partitions.

Random Suffix Sharding

Best for write-heavy patterns where read aggregation is acceptable:

import { DynamoDBDocumentClient, PutCommand, QueryCommand } from '@aws-sdk/lib-dynamodb';

const SHARD_COUNT = 10;

const getRandomShard = (): number => {
  return Math.floor(Math.random() * SHARD_COUNT);
};

// Writing with random shard - distributes writes evenly
const writeToShardedPartition = async (
  client: DynamoDBDocumentClient,
  status: string,
  userId: string,
  userData: Record<string, unknown>
): Promise<void> => {
  const shardId = getRandomShard();

  await client.send(new PutCommand({
    TableName: 'MainTable',
    Item: {
      PK: `STATUS#${status}#SHARD#${shardId}`,
      SK: `USER#${userId}`,
      ...userData
    }
  }));
};

// Reading requires scatter-gather across all shards
const readFromAllShards = async (
  client: DynamoDBDocumentClient,
  status: string
): Promise<Record<string, unknown>[]> => {
  const promises = Array.from({ length: SHARD_COUNT }, (_, i) =>
    client.send(new QueryCommand({
      TableName: 'MainTable',
      KeyConditionExpression: 'PK = :pk',
      ExpressionAttributeValues: {
        ':pk': `STATUS#${status}#SHARD#${i}`
      }
    }))
  );

  const results = await Promise.all(promises);
  return results.flatMap(r => r.Items ?? []);
};

Deterministic Sharding

When you need to read specific items without scatter-gather:

import { createHash } from 'crypto';

const getDeterministicShard = (entityId: string): number => {
  const hash = createHash('md5').update(entityId).digest('hex');
  return parseInt(hash.substring(0, 8), 16) % SHARD_COUNT;
};

// Write with consistent shard based on order ID
const writeOrderWithShard = async (
  client: DynamoDBDocumentClient,
  date: string,
  orderId: string,
  orderData: Record<string, unknown>
): Promise<void> => {
  const shardId = getDeterministicShard(orderId);

  await client.send(new PutCommand({
    TableName: 'MainTable',
    Item: {
      PK: `ORDERS#DATE#${date}#SHARD#${shardId}`,
      SK: `ORDER#${orderId}`,
      ...orderData
    }
  }));
};

// Read specific order - calculate shard, single query
const readOrder = async (
  client: DynamoDBDocumentClient,
  date: string,
  orderId: string
): Promise<Record<string, unknown> | undefined> => {
  const shardId = getDeterministicShard(orderId);

  const result = await client.send(new GetCommand({
    TableName: 'MainTable',
    Key: {
      PK: `ORDERS#DATE#${date}#SHARD#${shardId}`,
      SK: `ORDER#${orderId}`
    }
  }));

  return result.Item;
};

GSI Write Sharding

Apply the same pattern to Global Secondary Indexes to prevent GSI throttling from blocking base table writes:

import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';

// CDK definition with sharded GSI
const table = new dynamodb.Table(this, 'MainTable', {
  partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
  sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
  billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
});

table.addGlobalSecondaryIndex({
  indexName: 'GSI1',
  partitionKey: { name: 'GSI1PK', type: dynamodb.AttributeType.STRING },
  sortKey: { name: 'GSI1SK', type: dynamodb.AttributeType.STRING },
  projectionType: dynamodb.ProjectionType.ALL,
});

// Writing with GSI sharding
const writeOrderWithGSIShard = async (
  client: DynamoDBDocumentClient,
  userId: string,
  orderId: string,
  orderDate: string
): Promise<void> => {
  const shardId = getRandomShard();

  await client.send(new PutCommand({
    TableName: 'MainTable',
    Item: {
      PK: `USER#${userId}`,
      SK: `ORDER#${orderDate}#${orderId}`,
      EntityType: 'Order',
      // Sharded GSI keys
      GSI1PK: `ORDERS#DATE#${orderDate}#SHARD#${shardId}`,
      GSI1SK: `USER#${userId}#ORDER#${orderId}`
    }
  }));
};

Warning: GSI throttling causes backpressure to base table writes. If your GSI cannot keep up with base table write velocity, all writes fail. Always match GSI capacity to base table needs.

Capacity Mode Selection

On-Demand Mode: Understanding the Limits

On-demand capacity has scaling constraints that catch teams off guard:

interface OnDemandBehavior {
  // Initial capacity for new tables
  initialCapacity: {
    rcu: 12000;  // 4 partitions * 3,000 RCU
    wcu: 4000;  // 4 partitions * 1,000 WCU
  };

  scaling: {
    // Instant scale to previous peak
    previousPeak: 'instant';

    // Beyond previous peak: limited growth
    beyondPeak: {
      rate: 'Double every 30 minutes';
      limit: 'Cannot exceed 2x within 30-min window';
    };
  };

  // Account-level limits
  accountLimits: {
    defaultPerTable: 40000;  // RCU and WCU
    requestIncrease: true;
  };
}

For traffic spikes, this 2x limit matters. A flash sale with 10x normal traffic cannot be handled immediately by on-demand. The table needs to “warm up” gradually or use pre-provisioned capacity.

Provisioned with Auto-Scaling

For predictable workloads with cost sensitivity:

import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as appautoscaling from 'aws-cdk-lib/aws-applicationautoscaling';
import { Duration } from 'aws-cdk-lib';

const table = new dynamodb.Table(this, 'MainTable', {
  tableName: 'ProductionTable',
  partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
  sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
  billingMode: dynamodb.BillingMode.PROVISIONED,
  readCapacity: 100,
  writeCapacity: 50,
});

// Auto-scaling for reads
const readScaling = table.autoScaleReadCapacity({
  minCapacity: 10,
  maxCapacity: 1000,
});

readScaling.scaleOnUtilization({
  targetUtilizationPercent: 70,  // Scale up before hitting limits
});

// Auto-scaling for writes
const writeScaling = table.autoScaleWriteCapacity({
  minCapacity: 5,
  maxCapacity: 500,
});

writeScaling.scaleOnUtilization({
  targetUtilizationPercent: 70,
});

// Scheduled scaling for predictable patterns
writeScaling.scaleOnSchedule('ScaleUpMorning', {
  schedule: appautoscaling.Schedule.cron({ hour: '8', minute: '0' }),
  minCapacity: 100,
  maxCapacity: 500,
});

writeScaling.scaleOnSchedule('ScaleDownNight', {
  schedule: appautoscaling.Schedule.cron({ hour: '22', minute: '0' }),
  minCapacity: 5,
  maxCapacity: 100,
});

Decision Framework

Factor	On-Demand	Provisioned + Auto-Scaling
Traffic predictability	Unpredictable/spiky	Steady with gradual changes
Scaling speed needed	Instant (within 2x)	1-2 minute delay acceptable
Cost sensitivity	Lower priority	Higher priority
Peak-to-average ratio	> 4:1	< 4:1
Development/testing	Recommended	Not recommended
Utilization rate	< 30% average	> 30% average

Burst and Adaptive Capacity

DynamoDB provides two automatic mechanisms that help with uneven traffic patterns.

Burst Capacity

Unused capacity accumulates for up to 5 minutes and can be consumed during traffic spikes:

interface BurstCapacity {
  accumulation: {
    source: 'Unused provisioned capacity';
    maxRetention: '5 minutes (300 seconds)';
    refillRate: '1 token per unused RCU/WCU per second';
  };

  consumption: {
    trigger: 'Traffic exceeds provisioned capacity';
    speed: 'Can consume faster than provisioned rate';
    limit: 'Until burst bucket depleted';
  };

  // Important limitations
  warnings: [
    'Temporary safeguard, not capacity planning substitute',
    'DynamoDB may use for background maintenance',
    'No guarantee of availability',
    'Cannot be monitored or relied upon'
  ];
}

Adaptive Capacity and Split-for-Heat

DynamoDB automatically rebalances capacity toward hot partitions and can split them when needed:

interface AdaptiveCapacity {
  behavior: {
    detection: 'Monitors traffic patterns per partition';
    action: 'Reallocates throughput from cold to hot partitions';
    limit: 'Cannot exceed partition maximum (3,000 RCU, 1,000 WCU)';
  };

  splitForHeat: {
    trigger: 'Sustained high throughput on single partition';
    action: 'Automatically splits partition into two';
    result: 'Doubles available capacity for that key range';
    timing: 'Takes several minutes';
  };

  // When it helps
  scenarios: [
    'Temporary traffic spikes',
    'Gradual hot partition development',
    'Uneven but distributed access patterns'
  ];

  // When it does NOT help
  limitations: [
    'Single hot key (celebrity problem)',
    'All writes to same partition key value',
    'Low-cardinality partition keys',
    'Item collections with LSI cannot split'
  ];
}

Note: Adaptive capacity rebalancing is instant (since May 2019), but split-for-heat (partition splitting) takes several minutes. For flash sale scenarios or viral content, a single hot partition key cannot be helped by either mechanism. Design partition keys properly rather than relying on adaptive capacity.

DAX for Read-Heavy Workloads

DynamoDB Accelerator (DAX) offloads read traffic from DynamoDB, reducing both latency and capacity consumption.

Note: The DAX SDK for JavaScript v3 (@amazon-dax-sdk/lib-dax) was released in March 2025. It uses aggregated methods (.get(), .query()) instead of the .send() pattern used by the standard DynamoDB SDK v3.

import { DaxDocument } from '@amazon-dax-sdk/lib-dax';
import { DynamoDBDocumentClient, UpdateCommand } from '@aws-sdk/lib-dynamodb';

// DAX client setup (AWS SDK v3 compatible)
const createDaxClient = (endpoints: string[]): DaxDocument => {
  return new DaxDocument({
    endpoints,
    region: process.env.AWS_REGION ?? 'us-east-1',
  });
};

// Client factory for choosing based on operation type
interface ClientFactory {
  daxClient: DaxDocument;  // For cacheable reads
  dynamoClient: DynamoDBDocumentClient;  // For writes, strong consistency
}

// Usage pattern: reads through DAX, writes directly
const productService = {
  // Read through DAX (microsecond latency, offloads DynamoDB)
  // Note: DaxDocument uses aggregated methods, not .send()
  getProduct: async (
    factory: ClientFactory,
    productId: string
  ): Promise<Record<string, unknown> | undefined> => {
    const result = await factory.daxClient.get({
      TableName: 'Products',
      Key: { PK: `PRODUCT#${productId}`, SK: 'METADATA' }
    });
    return result.Item;
  },

  // Query through DAX (cached result sets)
  getProductsByCategory: async (
    factory: ClientFactory,
    category: string
  ): Promise<Record<string, unknown>[]> => {
    const result = await factory.daxClient.query({
      TableName: 'Products',
      IndexName: 'GSI1',
      KeyConditionExpression: 'GSI1PK = :category',
      ExpressionAttributeValues: { ':category': `CATEGORY#${category}` }
    });
    return result.Items ?? [];
  },

  // Write directly to DynamoDB
  // IMPORTANT: DAX only auto-invalidates cache for writes made THROUGH DAX.
  // Writes directly to DynamoDB (bypassing DAX) are NOT reflected in DAX
  // cache until TTL expires. For write-through caching, use daxClient.put().
  updateProduct: async (
    factory: ClientFactory,
    productId: string,
    updates: Record<string, unknown>
  ): Promise<void> => {
    await factory.dynamoClient.send(new UpdateCommand({
      TableName: 'Products',
      Key: { PK: `PRODUCT#${productId}`, SK: 'METADATA' },
      UpdateExpression: 'SET #name = :name, #price = :price',
      ExpressionAttributeNames: { '#name': 'name', '#price': 'price' },
      ExpressionAttributeValues: updates
    }));
  }
};

When DAX Makes Sense

Use Case	DAX Value
Product catalogs (high read, low write)	High
User sessions (read-mostly)	High
Configuration data (rarely changes)	High
Flash sale product pages	Very High
Write-heavy workloads	Low
Strong consistency requirements	None
Low traffic (< 200 req/sec)	Negative (cost overhead)
Random access patterns (< 80% hit rate)	Low

TTL Strategy by Data Type

const daxTTLStrategy = {
  staticData: {
    ttl: 3600000,  // 1 hour
    examples: ['Product catalog', 'Category list', 'Configuration']
  },
  semiStatic: {
    ttl: 300000,  // 5 minutes (default)
    examples: ['User profiles', 'Settings', 'Preferences']
  },
  dynamic: {
    ttl: 60000,  // 1 minute
    examples: ['Inventory counts', 'Availability', 'Pricing']
  }
};

Retry Strategies and Circuit Breakers

Handling throttling gracefully requires proper retry logic. The AWS SDK provides built-in retries, but batch operations need additional handling.

SDK Configuration

import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';

const createClientWithRetry = (): DynamoDBDocumentClient => {
  const client = new DynamoDBClient({
    maxAttempts: 10,
    retryMode: 'adaptive',  // Recommended for DynamoDB
    // Adaptive mode tracks throttling per resource
    // and reduces throughput for throttled tables
  });

  return DynamoDBDocumentClient.from(client);
};

Batch Operations: Handling Unprocessed Items

The SDK does NOT automatically retry unprocessed items from batch operations:

import { DynamoDBDocumentClient, BatchWriteCommand } from '@aws-sdk/lib-dynamodb';

const batchWriteWithRetry = async (
  client: DynamoDBDocumentClient,
  tableName: string,
  items: Record<string, unknown>[],
  maxRetries: number = 5
): Promise<void> => {
  const chunks = chunkArray(items, 25);  // BatchWrite limit

  for (const chunk of chunks) {
    let unprocessed: Record<string, unknown>[] | undefined = chunk;
    let attempts = 0;

    while (unprocessed && unprocessed.length > 0 && attempts < maxRetries) {
      const result = await client.send(new BatchWriteCommand({
        RequestItems: {
          [tableName]: unprocessed.map(item => ({
            PutRequest: { Item: item }
          }))
        }
      }));

      const unprocessedItems = result.UnprocessedItems?.[tableName];

      if (unprocessedItems && unprocessedItems.length > 0) {
        unprocessed = unprocessedItems
          .map(req => req.PutRequest?.Item as Record<string, unknown>)
          .filter(Boolean);

        // Exponential backoff with jitter
        const delay = Math.min(100 * Math.pow(2, attempts), 5000);
        const jitter = delay * 0.2 * Math.random();
        await sleep(delay + jitter);

        attempts++;
      } else {
        unprocessed = undefined;
      }
    }

    if (unprocessed && unprocessed.length > 0) {
      throw new Error(
        `Failed to write ${unprocessed.length} items after ${maxRetries} retries`
      );
    }
  }
};

const chunkArray = <T>(array: T[], size: number): T[][] => {
  const chunks: T[][] = [];
  for (let i = 0; i < array.length; i += size) {
    chunks.push(array.slice(i, i + size));
  }
  return chunks;
};

const sleep = (ms: number): Promise<void> =>
  new Promise(resolve => setTimeout(resolve, ms));

Circuit Breaker for Sustained Throttling

When throttling persists, a circuit breaker prevents retry storms:

import {
  ProvisionedThroughputExceededException,
  ThrottlingException
} from '@aws-sdk/client-dynamodb';

interface CircuitBreakerConfig {
  failureThreshold: number;  // Failures before opening
  resetTimeout: number;  // Time before trying again (ms)
}

class DynamoDBCircuitBreaker {
  private failures = 0;
  private lastFailure: number = 0;
  private state: 'closed' | 'open' | 'half-open' = 'closed';

  constructor(private config: CircuitBreakerConfig) {}

  async execute<T>(operation: () => Promise<T>): Promise<T> {
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.config.resetTimeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker is open - request rejected');
      }
    }

    try {
      const result = await operation();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure(error);
      throw error;
    }
  }

  private onSuccess(): void {
    this.failures = 0;
    this.state = 'closed';
  }

  private onFailure(error: unknown): void {
    if (
      error instanceof ProvisionedThroughputExceededException ||
      error instanceof ThrottlingException
    ) {
      this.failures++;
      this.lastFailure = Date.now();

      if (this.failures >= this.config.failureThreshold) {
        this.state = 'open';
      }
    }
  }
}

// Usage
const circuitBreaker = new DynamoDBCircuitBreaker({
  failureThreshold: 5,
  resetTimeout: 30000,  // 30 seconds
});

const writeWithProtection = async (
  client: DynamoDBDocumentClient,
  item: Record<string, unknown>
): Promise<void> => {
  await circuitBreaker.execute(async () => {
    await client.send(new PutCommand({
      TableName: 'MainTable',
      Item: item
    }));
  });
};

Client-Side Rate Limiting

Proactively limiting request rates prevents throttling from occurring:

class TokenBucket {
  private tokens: number;
  private lastRefill: number;

  constructor(
    private maxTokens: number,
    private refillRate: number  // tokens per second
  ) {
    this.tokens = maxTokens;
    this.lastRefill = Date.now();
  }

  async acquire(count: number = 1): Promise<boolean> {
    this.refill();

    if (this.tokens >= count) {
      this.tokens -= count;
      return true;
    }

    // Wait for tokens to be available
    const waitTime = ((count - this.tokens) / this.refillRate) * 1000;
    await sleep(waitTime);
    this.refill();
    this.tokens -= count;
    return true;
  }

  private refill(): void {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(
      this.maxTokens,
      this.tokens + elapsed * this.refillRate
    );
    this.lastRefill = now;
  }
}

// Rate-limited DynamoDB wrapper
class RateLimitedDynamoDB {
  private readBucket: TokenBucket;
  private writeBucket: TokenBucket;

  constructor(
    private client: DynamoDBDocumentClient,
    readCapacity: number,
    writeCapacity: number
  ) {
    // Use 80% of capacity to leave headroom
    this.readBucket = new TokenBucket(readCapacity * 0.8, readCapacity * 0.8);
    this.writeBucket = new TokenBucket(writeCapacity * 0.8, writeCapacity * 0.8);
  }

  async get(
    tableName: string,
    key: Record<string, unknown>
  ): Promise<Record<string, unknown> | undefined> {
    await this.readBucket.acquire(1);  // 1 RCU for <4KB item

    const result = await this.client.send(new GetCommand({
      TableName: tableName,
      Key: key
    }));

    return result.Item;
  }

  async put(
    tableName: string,
    item: Record<string, unknown>
  ): Promise<void> {
    const itemSize = JSON.stringify(item).length;
    const wcuNeeded = Math.ceil(itemSize / 1024);  // 1 WCU per KB

    await this.writeBucket.acquire(wcuNeeded);

    await this.client.send(new PutCommand({
      TableName: tableName,
      Item: item
    }));
  }
}

CloudWatch Monitoring and Alerting

Proper monitoring catches throttling before it impacts users.

Key Metrics

const throttlingMetrics = {
  primary: [
    {
      name: 'ThrottledRequests',
      description: 'Any request that was throttled',
      alarm: 'Sum > 0 for 1 minute',
      action: 'Investigate immediately'
    },
    {
      name: 'ReadThrottleEvents',
      description: 'Individual read throttle events',
      alarm: 'Sum > 10 per minute',
      action: 'Check partition key design or increase capacity'
    },
    {
      name: 'WriteThrottleEvents',
      description: 'Individual write throttle events',
      alarm: 'Sum > 10 per minute',
      action: 'Implement write sharding'
    }
  ],

  utilization: [
    {
      name: 'ConsumedReadCapacityUnits',
      alarm: 'Average > 80% of provisioned for 5 minutes',
      action: 'Scale up or enable auto-scaling'
    },
    {
      name: 'ConsumedWriteCapacityUnits',
      alarm: 'Average > 80% of provisioned for 5 minutes',
      action: 'Scale up or enable auto-scaling'
    }
  ],

  gsi: [
    {
      name: 'OnlineIndexThrottleEvents',
      description: 'GSI throttling (causes backpressure)',
      alarm: 'Any occurrence',
      action: 'Increase GSI capacity'
    }
  ],

  // Granular throttle metrics (useful for diagnosing specific issues)
  advanced: [
    { name: 'ReadMaxOnDemandThroughputThrottleEvents', description: 'On-demand max throughput exceeded' },
    { name: 'WriteMaxOnDemandThroughputThrottleEvents', description: 'On-demand max throughput exceeded' },
    { name: 'ReadAccountLimitThrottleEvents', description: 'Account-level limit hit' },
    { name: 'WriteAccountLimitThrottleEvents', description: 'Account-level limit hit' },
    { name: 'ReadKeyRangeThroughputThrottleEvents', description: 'Partition-level limit hit' },
    { name: 'WriteKeyRangeThroughputThrottleEvents', description: 'Partition-level limit hit' }
  ]
};

CDK Alarm Configuration

import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import * as cloudwatch_actions from 'aws-cdk-lib/aws-cloudwatch-actions';
import * as sns from 'aws-cdk-lib/aws-sns';
import { Duration } from 'aws-cdk-lib';

const createThrottlingAlarms = (
  table: dynamodb.Table,
  alertTopic: sns.Topic
): cloudwatch.Alarm[] => {
  const alarms: cloudwatch.Alarm[] = [];

  // Throttled requests alarm - immediate attention
  alarms.push(new cloudwatch.Alarm(table, 'ThrottlingAlarm', {
    alarmName: `${table.tableName}-Throttling`,
    metric: table.metricThrottledRequestsForOperations({
      operations: [
        dynamodb.Operation.GET_ITEM,
        dynamodb.Operation.PUT_ITEM,
        dynamodb.Operation.QUERY,
        dynamodb.Operation.SCAN
      ],
      period: Duration.minutes(1)
    }),
    threshold: 1,
    evaluationPeriods: 1,
    comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
    treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
  }));

  // High read utilization - early warning
  alarms.push(new cloudwatch.Alarm(table, 'HighReadUtilization', {
    alarmName: `${table.tableName}-HighReadUtilization`,
    metric: new cloudwatch.MathExpression({
      expression: 'm1 / m2 * 100',
      usingMetrics: {
        m1: table.metricConsumedReadCapacityUnits({ period: Duration.minutes(5) }),
        m2: table.metricProvisionedReadCapacityUnits({ period: Duration.minutes(5) })
      }
    }),
    threshold: 80,
    evaluationPeriods: 3,
    comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
  }));

  // Add SNS actions
  alarms.forEach(alarm => {
    alarm.addAlarmAction(new cloudwatch_actions.SnsAction(alertTopic));
  });

  return alarms;
};

Contributor Insights for Hot Key Detection

Enable Contributor Insights to identify which partition keys are causing throttling:

import { DynamoDBClient, UpdateContributorInsightsCommand } from '@aws-sdk/client-dynamodb';

// Mode options:
// - ACCESSED_AND_THROTTLED_KEYS: All accessed keys + throttled keys (default, higher cost)
// - THROTTLED_KEYS: Only throttled keys (cost-effective for throttle debugging)

const enableContributorInsights = async (
  client: DynamoDBClient,
  tableName: string
): Promise<void> => {
  await client.send(new UpdateContributorInsightsCommand({
    TableName: tableName,
    ContributorInsightsAction: 'ENABLE',
  }));
};

// Contributor Insights reveals:
// - Top partition keys by consumed capacity
// - Throttled partition keys
// - Access patterns over time
// Essential for debugging Single Table Design throttling
// Tip: Use THROTTLED_KEYS mode if you only need to debug throttling (lower cost)

Common Pitfalls and Solutions

Pitfall 1: Relying on Adaptive Capacity

// WRONG: Assuming DynamoDB handles hot partitions automatically
// Reality: Adaptive rebalancing is instant, but split-for-heat takes minutes
// Neither helps with single hot partition key (celebrity problem)
// Flash sales or viral content on one key = throttling regardless

// RIGHT: Design for even distribution from the start
// Use write sharding for known low-cardinality patterns

Pitfall 2: Ignoring GSI Capacity

// WRONG: Setting GSI capacity lower than base table
// Assumption: "GSI has less traffic"
// Result: GSI throttling blocks ALL base table writes

// RIGHT: GSI capacity >= base table write capacity
// Or use on-demand for automatic scaling

Pitfall 3: On-Demand Scaling Assumptions

// WRONG: "On-demand scales instantly to any level"
// Reality: 2x scaling limit within 30-minute windows
// 50k req/sec to 250k req/sec takes ~1 hour

// RIGHT: Pre-warm before expected spikes
// Or use provisioned with high capacity for planned events
// Tip: Consider AWS's "warm throughput" feature for configuring
// higher initial throughput values on new or restored tables

Pitfall 4: Missing Batch Retry Logic

// WRONG: Assume BatchWriteItem processes all items
const result = await client.send(new BatchWriteCommand({ ... }));
// Some items may have failed!

// RIGHT: Always check and retry unprocessed items
if (result.UnprocessedItems &&
    Object.keys(result.UnprocessedItems).length > 0) {
  // Implement exponential backoff retry
}

Pitfall 5: Not Monitoring Per-Partition Metrics

// WRONG: Only monitor table-level capacity
// "Table has 500 WCU available, why throttling?"

// RIGHT: Enable Contributor Insights
// Reveals: One partition key consuming its 1,000 WCU limit
// Table-level headroom doesn't help partition-level throttling

Key Takeaways

Design Partition Keys First: Hot partitions cause 90% of throttling issues
Understand Per-Partition Limits: 3,000 RCU / 1,000 WCU per partition is the real constraint
Write Sharding Works: 10 shards = 10x write throughput for same access pattern
Adaptive Capacity Has Limits: Rebalancing is instant, but split-for-heat takes minutes; neither helps single hot keys
On-Demand Has Limits: 2x scaling within 30 minutes, not unlimited
GSI Throttling Blocks Writes: Capacity matching is essential
DAX Needs High Hit Rate: Below 80% cache hit rate, ROI is negative
Monitor Contributor Insights: Only way to identify hot keys in Single Table Design
Retry Unprocessed Items: SDK does not auto-retry batch operation failures
Pre-warm for Events: Both provisioned and on-demand need preparation for traffic spikes

Building throttle-resistant DynamoDB applications requires understanding these mechanics and implementing appropriate patterns at each layer. Start with partition key design, add sharding where needed, implement proper retries, and monitor aggressively. The result is a system that scales predictably without unexpected throttling incidents.

Caching Strategies: From Local Memory to Distributed Systems

A comprehensive guide to implementing caching strategies across multiple tiers, from in-memory application caches to distributed Redis clusters and CDN edge caching. Learn when to use cache-aside vs write-through patterns, how to choose between ElastiCache and MemoryDB, and how to prevent cache stampede in production.

cachingredisaws+5

December 19, 2025

AWS AppSync & GraphQL: Building Production-Ready Real-time APIs

A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.

awsappsyncgraphql+5

December 14, 2025

DynamoDB Single-Table Design: A Comprehensive Modeling Guide

Master DynamoDB single-table design with practical patterns for modeling relationships, choosing between GSI and LSI, optimizing with DAX, and avoiding common pitfalls in production NoSQL systems.

dynamodbnosqlaws+4

November 17, 2025

Key-Value Storage Fundamentals - A Guide to Understanding and Choosing the Right Solution

A comprehensive foundational guide to key-value storage that answers four fundamental questions: What is KV storage? Where is it used? Why choose KV storage? Which tech stacks include which solutions?

redisdynamodbkey-value-storage+5

September 15, 2025

AWS CDK Link Shortener Part 4: Production Deployment & Optimization

Multi-environment deployment strategies, performance optimization at scale, and cost management. Production insights and lessons learned with proper monitoring and incident response patterns.

aws-cdklambdadynamodb+6

September 5, 2025