2026-01-28
DynamoDB Rate Limiting: Strategies for Single Table Design at Scale
Practical strategies to prevent and handle DynamoDB throttling in Single Table Design applications. Covers partition key design, write sharding, capacity modes, DAX caching, retry patterns, and CloudWatch monitoring for high-throughput systems.
When working with DynamoDB at scale, throttling becomes an inevitable challenge. The ProvisionedThroughputExceededException error often appears despite having adequate table-level capacity, and understanding why requires diving into DynamoDB’s internal mechanics.
This guide covers proven patterns for preventing and handling throttling in Single Table Design applications, from partition key strategies to monitoring configurations that catch issues before they impact users.
Understanding DynamoDB’s Throttling Mechanism
DynamoDB uses a token bucket algorithm for rate limiting. Each partition maintains its own bucket of read and write tokens that refill at a rate matching provisioned capacity. When tokens are depleted, requests get throttled.
The critical limits to remember:
| Resource | Limit |
|---|---|
| Read Capacity per Partition | 3,000 RCU |
| Write Capacity per Partition | 1,000 WCU |
| Storage per Partition | 10 GB |
| Item Size | 400 KB (hard limit) |
Here’s what makes this tricky: provisioned capacity is distributed across partitions. A table with 100 RCU and 3 partitions means each partition gets roughly 33 RCU. If one partition receives 80% of traffic, it will throttle even though the table has headroom.
// Conceptual model: How capacity gets distributed
interface PartitionCapacity {
// Table-level settings
tableRCU: 100;
tableWCU: 50;
partitionCount: 3;
// Per-partition reality
perPartitionRCU: 33; // ~100/3
perPartitionWCU: 17; // ~50/3
// The problem: uneven traffic
actualTraffic: {
partition1: { rcu: 80 }; // 80 > 33 = THROTTLED
partition2: { rcu: 10 }; // Underutilized
partition3: { rcu: 10 }; // Underutilized
};
}
Partition Key Design: The Foundation
Hot partitions cause most throttling issues. Getting partition key design right prevents problems that no amount of capacity can solve.
Anti-Patterns to Avoid
// ANTI-PATTERN 1: Low cardinality partition key
interface BadDesign1 {
PK: 'STATUS#active' | 'STATUS#inactive'; // Only 2 values
SK: `USER#${string}`;
}
// Result: All active users in one partition
// With 100,000 active users: immediate throttling
// ANTI-PATTERN 2: Time-based partition key
interface BadDesign2 {
PK: `DATE#${string}`; // e.g., "DATE#2024-01-15"
SK: `EVENT#${string}`;
}
// Result: All today's events hit one partition
// Peak hours create hot partition
// ANTI-PATTERN 3: Celebrity/Viral content problem
interface BadDesign3 {
PK: `POST#${string}`; // Viral post ID
SK: `LIKE#${string}`;
}
// Result: Viral post with millions of likes
// Single partition cannot handle the load
// ANTI-PATTERN 4: Large tenant dominance
interface BadDesign4 {
PK: `TENANT#${string}`;
SK: `ORDER#${string}`;
}
// Result: Enterprise tenant with 80% of orders
// Their partition is always hot
High-Cardinality Patterns That Work
// PATTERN 1: User-scoped partition keys
interface GoodDesign1 {
PK: `USER#${userId}`; // Unique per user
SK: `ORDER#${timestamp}#${orderId}`;
}
// Result: Millions of unique partition keys
// Traffic naturally distributed
// PATTERN 2: Composite keys for multi-tenant
interface GoodDesign2 {
PK: `TENANT#${tenantId}#USER#${userId}`;
SK: string;
}
// Result: Even distribution within and across tenants
// Large tenant's users still spread across partitions
// PATTERN 3: Hierarchical with high cardinality at PK level
interface GoodDesign3 {
PK: `REGION#${region}#STORE#${storeId}`;
SK: `PRODUCT#${category}#${productId}`;
}
// Result: Queries scoped to store level
// Each store has its own partition space
// PATTERN 4: GSI for low-cardinality queries
interface GoodDesign4 {
PK: `USER#${userId}`;
SK: 'METADATA';
status: 'active' | 'inactive';
GSI1PK: `STATUS#${status}#SHARD#${shardId}`; // Sharded!
GSI1SK: `USER#${userId}`;
}
// Base table: High cardinality (users)
// GSI: Handles status queries with sharding
Write Sharding: Distributing Hot Keys
When business requirements force low-cardinality access patterns, write sharding distributes load across multiple partitions.
Random Suffix Sharding
Best for write-heavy patterns where read aggregation is acceptable:
import { DynamoDBDocumentClient, PutCommand, QueryCommand } from '@aws-sdk/lib-dynamodb';
const SHARD_COUNT = 10;
const getRandomShard = (): number => {
return Math.floor(Math.random() * SHARD_COUNT);
};
// Writing with random shard - distributes writes evenly
const writeToShardedPartition = async (
client: DynamoDBDocumentClient,
status: string,
userId: string,
userData: Record<string, unknown>
): Promise<void> => {
const shardId = getRandomShard();
await client.send(new PutCommand({
TableName: 'MainTable',
Item: {
PK: `STATUS#${status}#SHARD#${shardId}`,
SK: `USER#${userId}`,
...userData
}
}));
};
// Reading requires scatter-gather across all shards
const readFromAllShards = async (
client: DynamoDBDocumentClient,
status: string
): Promise<Record<string, unknown>[]> => {
const promises = Array.from({ length: SHARD_COUNT }, (_, i) =>
client.send(new QueryCommand({
TableName: 'MainTable',
KeyConditionExpression: 'PK = :pk',
ExpressionAttributeValues: {
':pk': `STATUS#${status}#SHARD#${i}`
}
}))
);
const results = await Promise.all(promises);
return results.flatMap(r => r.Items ?? []);
};
Deterministic Sharding
When you need to read specific items without scatter-gather:
import { createHash } from 'crypto';
const getDeterministicShard = (entityId: string): number => {
const hash = createHash('md5').update(entityId).digest('hex');
return parseInt(hash.substring(0, 8), 16) % SHARD_COUNT;
};
// Write with consistent shard based on order ID
const writeOrderWithShard = async (
client: DynamoDBDocumentClient,
date: string,
orderId: string,
orderData: Record<string, unknown>
): Promise<void> => {
const shardId = getDeterministicShard(orderId);
await client.send(new PutCommand({
TableName: 'MainTable',
Item: {
PK: `ORDERS#DATE#${date}#SHARD#${shardId}`,
SK: `ORDER#${orderId}`,
...orderData
}
}));
};
// Read specific order - calculate shard, single query
const readOrder = async (
client: DynamoDBDocumentClient,
date: string,
orderId: string
): Promise<Record<string, unknown> | undefined> => {
const shardId = getDeterministicShard(orderId);
const result = await client.send(new GetCommand({
TableName: 'MainTable',
Key: {
PK: `ORDERS#DATE#${date}#SHARD#${shardId}`,
SK: `ORDER#${orderId}`
}
}));
return result.Item;
};
GSI Write Sharding
Apply the same pattern to Global Secondary Indexes to prevent GSI throttling from blocking base table writes:
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
// CDK definition with sharded GSI
const table = new dynamodb.Table(this, 'MainTable', {
partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
});
table.addGlobalSecondaryIndex({
indexName: 'GSI1',
partitionKey: { name: 'GSI1PK', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'GSI1SK', type: dynamodb.AttributeType.STRING },
projectionType: dynamodb.ProjectionType.ALL,
});
// Writing with GSI sharding
const writeOrderWithGSIShard = async (
client: DynamoDBDocumentClient,
userId: string,
orderId: string,
orderDate: string
): Promise<void> => {
const shardId = getRandomShard();
await client.send(new PutCommand({
TableName: 'MainTable',
Item: {
PK: `USER#${userId}`,
SK: `ORDER#${orderDate}#${orderId}`,
EntityType: 'Order',
// Sharded GSI keys
GSI1PK: `ORDERS#DATE#${orderDate}#SHARD#${shardId}`,
GSI1SK: `USER#${userId}#ORDER#${orderId}`
}
}));
};
Warning: GSI throttling causes backpressure to base table writes. If your GSI cannot keep up with base table write velocity, all writes fail. Always match GSI capacity to base table needs.
Capacity Mode Selection
On-Demand Mode: Understanding the Limits
On-demand capacity has scaling constraints that catch teams off guard:
interface OnDemandBehavior {
// Initial capacity for new tables
initialCapacity: {
rcu: 12000; // 4 partitions * 3,000 RCU
wcu: 4000; // 4 partitions * 1,000 WCU
};
scaling: {
// Instant scale to previous peak
previousPeak: 'instant';
// Beyond previous peak: limited growth
beyondPeak: {
rate: 'Double every 30 minutes';
limit: 'Cannot exceed 2x within 30-min window';
};
};
// Account-level limits
accountLimits: {
defaultPerTable: 40000; // RCU and WCU
requestIncrease: true;
};
}
For traffic spikes, this 2x limit matters. A flash sale with 10x normal traffic cannot be handled immediately by on-demand. The table needs to “warm up” gradually or use pre-provisioned capacity.
Provisioned with Auto-Scaling
For predictable workloads with cost sensitivity:
import * as dynamodb from 'aws-cdk-lib/aws-dynamodb';
import * as appautoscaling from 'aws-cdk-lib/aws-applicationautoscaling';
import { Duration } from 'aws-cdk-lib';
const table = new dynamodb.Table(this, 'MainTable', {
tableName: 'ProductionTable',
partitionKey: { name: 'PK', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'SK', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PROVISIONED,
readCapacity: 100,
writeCapacity: 50,
});
// Auto-scaling for reads
const readScaling = table.autoScaleReadCapacity({
minCapacity: 10,
maxCapacity: 1000,
});
readScaling.scaleOnUtilization({
targetUtilizationPercent: 70, // Scale up before hitting limits
});
// Auto-scaling for writes
const writeScaling = table.autoScaleWriteCapacity({
minCapacity: 5,
maxCapacity: 500,
});
writeScaling.scaleOnUtilization({
targetUtilizationPercent: 70,
});
// Scheduled scaling for predictable patterns
writeScaling.scaleOnSchedule('ScaleUpMorning', {
schedule: appautoscaling.Schedule.cron({ hour: '8', minute: '0' }),
minCapacity: 100,
maxCapacity: 500,
});
writeScaling.scaleOnSchedule('ScaleDownNight', {
schedule: appautoscaling.Schedule.cron({ hour: '22', minute: '0' }),
minCapacity: 5,
maxCapacity: 100,
});
Decision Framework
| Factor | On-Demand | Provisioned + Auto-Scaling |
|---|---|---|
| Traffic predictability | Unpredictable/spiky | Steady with gradual changes |
| Scaling speed needed | Instant (within 2x) | 1-2 minute delay acceptable |
| Cost sensitivity | Lower priority | Higher priority |
| Peak-to-average ratio | > 4:1 | < 4:1 |
| Development/testing | Recommended | Not recommended |
| Utilization rate | < 30% average | > 30% average |
Burst and Adaptive Capacity
DynamoDB provides two automatic mechanisms that help with uneven traffic patterns.
Burst Capacity
Unused capacity accumulates for up to 5 minutes and can be consumed during traffic spikes:
interface BurstCapacity {
accumulation: {
source: 'Unused provisioned capacity';
maxRetention: '5 minutes (300 seconds)';
refillRate: '1 token per unused RCU/WCU per second';
};
consumption: {
trigger: 'Traffic exceeds provisioned capacity';
speed: 'Can consume faster than provisioned rate';
limit: 'Until burst bucket depleted';
};
// Important limitations
warnings: [
'Temporary safeguard, not capacity planning substitute',
'DynamoDB may use for background maintenance',
'No guarantee of availability',
'Cannot be monitored or relied upon'
];
}
Adaptive Capacity and Split-for-Heat
DynamoDB automatically rebalances capacity toward hot partitions and can split them when needed:
interface AdaptiveCapacity {
behavior: {
detection: 'Monitors traffic patterns per partition';
action: 'Reallocates throughput from cold to hot partitions';
limit: 'Cannot exceed partition maximum (3,000 RCU, 1,000 WCU)';
};
splitForHeat: {
trigger: 'Sustained high throughput on single partition';
action: 'Automatically splits partition into two';
result: 'Doubles available capacity for that key range';
timing: 'Takes several minutes';
};
// When it helps
scenarios: [
'Temporary traffic spikes',
'Gradual hot partition development',
'Uneven but distributed access patterns'
];
// When it does NOT help
limitations: [
'Single hot key (celebrity problem)',
'All writes to same partition key value',
'Low-cardinality partition keys',
'Item collections with LSI cannot split'
];
}
Note: Adaptive capacity rebalancing is instant (since May 2019), but split-for-heat (partition splitting) takes several minutes. For flash sale scenarios or viral content, a single hot partition key cannot be helped by either mechanism. Design partition keys properly rather than relying on adaptive capacity.
DAX for Read-Heavy Workloads
DynamoDB Accelerator (DAX) offloads read traffic from DynamoDB, reducing both latency and capacity consumption.
Note: The DAX SDK for JavaScript v3 (
@amazon-dax-sdk/lib-dax) was released in March 2025. It uses aggregated methods (.get(),.query()) instead of the.send()pattern used by the standard DynamoDB SDK v3.
import { DaxDocument } from '@amazon-dax-sdk/lib-dax';
import { DynamoDBDocumentClient, UpdateCommand } from '@aws-sdk/lib-dynamodb';
// DAX client setup (AWS SDK v3 compatible)
const createDaxClient = (endpoints: string[]): DaxDocument => {
return new DaxDocument({
endpoints,
region: process.env.AWS_REGION ?? 'us-east-1',
});
};
// Client factory for choosing based on operation type
interface ClientFactory {
daxClient: DaxDocument; // For cacheable reads
dynamoClient: DynamoDBDocumentClient; // For writes, strong consistency
}
// Usage pattern: reads through DAX, writes directly
const productService = {
// Read through DAX (microsecond latency, offloads DynamoDB)
// Note: DaxDocument uses aggregated methods, not .send()
getProduct: async (
factory: ClientFactory,
productId: string
): Promise<Record<string, unknown> | undefined> => {
const result = await factory.daxClient.get({
TableName: 'Products',
Key: { PK: `PRODUCT#${productId}`, SK: 'METADATA' }
});
return result.Item;
},
// Query through DAX (cached result sets)
getProductsByCategory: async (
factory: ClientFactory,
category: string
): Promise<Record<string, unknown>[]> => {
const result = await factory.daxClient.query({
TableName: 'Products',
IndexName: 'GSI1',
KeyConditionExpression: 'GSI1PK = :category',
ExpressionAttributeValues: { ':category': `CATEGORY#${category}` }
});
return result.Items ?? [];
},
// Write directly to DynamoDB
// IMPORTANT: DAX only auto-invalidates cache for writes made THROUGH DAX.
// Writes directly to DynamoDB (bypassing DAX) are NOT reflected in DAX
// cache until TTL expires. For write-through caching, use daxClient.put().
updateProduct: async (
factory: ClientFactory,
productId: string,
updates: Record<string, unknown>
): Promise<void> => {
await factory.dynamoClient.send(new UpdateCommand({
TableName: 'Products',
Key: { PK: `PRODUCT#${productId}`, SK: 'METADATA' },
UpdateExpression: 'SET #name = :name, #price = :price',
ExpressionAttributeNames: { '#name': 'name', '#price': 'price' },
ExpressionAttributeValues: updates
}));
}
};
When DAX Makes Sense
| Use Case | DAX Value |
|---|---|
| Product catalogs (high read, low write) | High |
| User sessions (read-mostly) | High |
| Configuration data (rarely changes) | High |
| Flash sale product pages | Very High |
| Write-heavy workloads | Low |
| Strong consistency requirements | None |
| Low traffic (< 200 req/sec) | Negative (cost overhead) |
| Random access patterns (< 80% hit rate) | Low |
TTL Strategy by Data Type
const daxTTLStrategy = {
staticData: {
ttl: 3600000, // 1 hour
examples: ['Product catalog', 'Category list', 'Configuration']
},
semiStatic: {
ttl: 300000, // 5 minutes (default)
examples: ['User profiles', 'Settings', 'Preferences']
},
dynamic: {
ttl: 60000, // 1 minute
examples: ['Inventory counts', 'Availability', 'Pricing']
}
};
Retry Strategies and Circuit Breakers
Handling throttling gracefully requires proper retry logic. The AWS SDK provides built-in retries, but batch operations need additional handling.
SDK Configuration
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { DynamoDBDocumentClient } from '@aws-sdk/lib-dynamodb';
const createClientWithRetry = (): DynamoDBDocumentClient => {
const client = new DynamoDBClient({
maxAttempts: 10,
retryMode: 'adaptive', // Recommended for DynamoDB
// Adaptive mode tracks throttling per resource
// and reduces throughput for throttled tables
});
return DynamoDBDocumentClient.from(client);
};
Batch Operations: Handling Unprocessed Items
The SDK does NOT automatically retry unprocessed items from batch operations:
import { DynamoDBDocumentClient, BatchWriteCommand } from '@aws-sdk/lib-dynamodb';
const batchWriteWithRetry = async (
client: DynamoDBDocumentClient,
tableName: string,
items: Record<string, unknown>[],
maxRetries: number = 5
): Promise<void> => {
const chunks = chunkArray(items, 25); // BatchWrite limit
for (const chunk of chunks) {
let unprocessed: Record<string, unknown>[] | undefined = chunk;
let attempts = 0;
while (unprocessed && unprocessed.length > 0 && attempts < maxRetries) {
const result = await client.send(new BatchWriteCommand({
RequestItems: {
[tableName]: unprocessed.map(item => ({
PutRequest: { Item: item }
}))
}
}));
const unprocessedItems = result.UnprocessedItems?.[tableName];
if (unprocessedItems && unprocessedItems.length > 0) {
unprocessed = unprocessedItems
.map(req => req.PutRequest?.Item as Record<string, unknown>)
.filter(Boolean);
// Exponential backoff with jitter
const delay = Math.min(100 * Math.pow(2, attempts), 5000);
const jitter = delay * 0.2 * Math.random();
await sleep(delay + jitter);
attempts++;
} else {
unprocessed = undefined;
}
}
if (unprocessed && unprocessed.length > 0) {
throw new Error(
`Failed to write ${unprocessed.length} items after ${maxRetries} retries`
);
}
}
};
const chunkArray = <T>(array: T[], size: number): T[][] => {
const chunks: T[][] = [];
for (let i = 0; i < array.length; i += size) {
chunks.push(array.slice(i, i + size));
}
return chunks;
};
const sleep = (ms: number): Promise<void> =>
new Promise(resolve => setTimeout(resolve, ms));
Circuit Breaker for Sustained Throttling
When throttling persists, a circuit breaker prevents retry storms:
import {
ProvisionedThroughputExceededException,
ThrottlingException
} from '@aws-sdk/client-dynamodb';
interface CircuitBreakerConfig {
failureThreshold: number; // Failures before opening
resetTimeout: number; // Time before trying again (ms)
}
class DynamoDBCircuitBreaker {
private failures = 0;
private lastFailure: number = 0;
private state: 'closed' | 'open' | 'half-open' = 'closed';
constructor(private config: CircuitBreakerConfig) {}
async execute<T>(operation: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailure > this.config.resetTimeout) {
this.state = 'half-open';
} else {
throw new Error('Circuit breaker is open - request rejected');
}
}
try {
const result = await operation();
this.onSuccess();
return result;
} catch (error) {
this.onFailure(error);
throw error;
}
}
private onSuccess(): void {
this.failures = 0;
this.state = 'closed';
}
private onFailure(error: unknown): void {
if (
error instanceof ProvisionedThroughputExceededException ||
error instanceof ThrottlingException
) {
this.failures++;
this.lastFailure = Date.now();
if (this.failures >= this.config.failureThreshold) {
this.state = 'open';
}
}
}
}
// Usage
const circuitBreaker = new DynamoDBCircuitBreaker({
failureThreshold: 5,
resetTimeout: 30000, // 30 seconds
});
const writeWithProtection = async (
client: DynamoDBDocumentClient,
item: Record<string, unknown>
): Promise<void> => {
await circuitBreaker.execute(async () => {
await client.send(new PutCommand({
TableName: 'MainTable',
Item: item
}));
});
};
Client-Side Rate Limiting
Proactively limiting request rates prevents throttling from occurring:
class TokenBucket {
private tokens: number;
private lastRefill: number;
constructor(
private maxTokens: number,
private refillRate: number // tokens per second
) {
this.tokens = maxTokens;
this.lastRefill = Date.now();
}
async acquire(count: number = 1): Promise<boolean> {
this.refill();
if (this.tokens >= count) {
this.tokens -= count;
return true;
}
// Wait for tokens to be available
const waitTime = ((count - this.tokens) / this.refillRate) * 1000;
await sleep(waitTime);
this.refill();
this.tokens -= count;
return true;
}
private refill(): void {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(
this.maxTokens,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
}
// Rate-limited DynamoDB wrapper
class RateLimitedDynamoDB {
private readBucket: TokenBucket;
private writeBucket: TokenBucket;
constructor(
private client: DynamoDBDocumentClient,
readCapacity: number,
writeCapacity: number
) {
// Use 80% of capacity to leave headroom
this.readBucket = new TokenBucket(readCapacity * 0.8, readCapacity * 0.8);
this.writeBucket = new TokenBucket(writeCapacity * 0.8, writeCapacity * 0.8);
}
async get(
tableName: string,
key: Record<string, unknown>
): Promise<Record<string, unknown> | undefined> {
await this.readBucket.acquire(1); // 1 RCU for <4KB item
const result = await this.client.send(new GetCommand({
TableName: tableName,
Key: key
}));
return result.Item;
}
async put(
tableName: string,
item: Record<string, unknown>
): Promise<void> {
const itemSize = JSON.stringify(item).length;
const wcuNeeded = Math.ceil(itemSize / 1024); // 1 WCU per KB
await this.writeBucket.acquire(wcuNeeded);
await this.client.send(new PutCommand({
TableName: tableName,
Item: item
}));
}
}
CloudWatch Monitoring and Alerting
Proper monitoring catches throttling before it impacts users.
Key Metrics
const throttlingMetrics = {
primary: [
{
name: 'ThrottledRequests',
description: 'Any request that was throttled',
alarm: 'Sum > 0 for 1 minute',
action: 'Investigate immediately'
},
{
name: 'ReadThrottleEvents',
description: 'Individual read throttle events',
alarm: 'Sum > 10 per minute',
action: 'Check partition key design or increase capacity'
},
{
name: 'WriteThrottleEvents',
description: 'Individual write throttle events',
alarm: 'Sum > 10 per minute',
action: 'Implement write sharding'
}
],
utilization: [
{
name: 'ConsumedReadCapacityUnits',
alarm: 'Average > 80% of provisioned for 5 minutes',
action: 'Scale up or enable auto-scaling'
},
{
name: 'ConsumedWriteCapacityUnits',
alarm: 'Average > 80% of provisioned for 5 minutes',
action: 'Scale up or enable auto-scaling'
}
],
gsi: [
{
name: 'OnlineIndexThrottleEvents',
description: 'GSI throttling (causes backpressure)',
alarm: 'Any occurrence',
action: 'Increase GSI capacity'
}
],
// Granular throttle metrics (useful for diagnosing specific issues)
advanced: [
{ name: 'ReadMaxOnDemandThroughputThrottleEvents', description: 'On-demand max throughput exceeded' },
{ name: 'WriteMaxOnDemandThroughputThrottleEvents', description: 'On-demand max throughput exceeded' },
{ name: 'ReadAccountLimitThrottleEvents', description: 'Account-level limit hit' },
{ name: 'WriteAccountLimitThrottleEvents', description: 'Account-level limit hit' },
{ name: 'ReadKeyRangeThroughputThrottleEvents', description: 'Partition-level limit hit' },
{ name: 'WriteKeyRangeThroughputThrottleEvents', description: 'Partition-level limit hit' }
]
};
CDK Alarm Configuration
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
import * as cloudwatch_actions from 'aws-cdk-lib/aws-cloudwatch-actions';
import * as sns from 'aws-cdk-lib/aws-sns';
import { Duration } from 'aws-cdk-lib';
const createThrottlingAlarms = (
table: dynamodb.Table,
alertTopic: sns.Topic
): cloudwatch.Alarm[] => {
const alarms: cloudwatch.Alarm[] = [];
// Throttled requests alarm - immediate attention
alarms.push(new cloudwatch.Alarm(table, 'ThrottlingAlarm', {
alarmName: `${table.tableName}-Throttling`,
metric: table.metricThrottledRequestsForOperations({
operations: [
dynamodb.Operation.GET_ITEM,
dynamodb.Operation.PUT_ITEM,
dynamodb.Operation.QUERY,
dynamodb.Operation.SCAN
],
period: Duration.minutes(1)
}),
threshold: 1,
evaluationPeriods: 1,
comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_OR_EQUAL_TO_THRESHOLD,
treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
}));
// High read utilization - early warning
alarms.push(new cloudwatch.Alarm(table, 'HighReadUtilization', {
alarmName: `${table.tableName}-HighReadUtilization`,
metric: new cloudwatch.MathExpression({
expression: 'm1 / m2 * 100',
usingMetrics: {
m1: table.metricConsumedReadCapacityUnits({ period: Duration.minutes(5) }),
m2: table.metricProvisionedReadCapacityUnits({ period: Duration.minutes(5) })
}
}),
threshold: 80,
evaluationPeriods: 3,
comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
}));
// Add SNS actions
alarms.forEach(alarm => {
alarm.addAlarmAction(new cloudwatch_actions.SnsAction(alertTopic));
});
return alarms;
};
Contributor Insights for Hot Key Detection
Enable Contributor Insights to identify which partition keys are causing throttling:
import { DynamoDBClient, UpdateContributorInsightsCommand } from '@aws-sdk/client-dynamodb';
// Mode options:
// - ACCESSED_AND_THROTTLED_KEYS: All accessed keys + throttled keys (default, higher cost)
// - THROTTLED_KEYS: Only throttled keys (cost-effective for throttle debugging)
const enableContributorInsights = async (
client: DynamoDBClient,
tableName: string
): Promise<void> => {
await client.send(new UpdateContributorInsightsCommand({
TableName: tableName,
ContributorInsightsAction: 'ENABLE',
}));
};
// Contributor Insights reveals:
// - Top partition keys by consumed capacity
// - Throttled partition keys
// - Access patterns over time
// Essential for debugging Single Table Design throttling
// Tip: Use THROTTLED_KEYS mode if you only need to debug throttling (lower cost)
Common Pitfalls and Solutions
Pitfall 1: Relying on Adaptive Capacity
// WRONG: Assuming DynamoDB handles hot partitions automatically
// Reality: Adaptive rebalancing is instant, but split-for-heat takes minutes
// Neither helps with single hot partition key (celebrity problem)
// Flash sales or viral content on one key = throttling regardless
// RIGHT: Design for even distribution from the start
// Use write sharding for known low-cardinality patterns
Pitfall 2: Ignoring GSI Capacity
// WRONG: Setting GSI capacity lower than base table
// Assumption: "GSI has less traffic"
// Result: GSI throttling blocks ALL base table writes
// RIGHT: GSI capacity >= base table write capacity
// Or use on-demand for automatic scaling
Pitfall 3: On-Demand Scaling Assumptions
// WRONG: "On-demand scales instantly to any level"
// Reality: 2x scaling limit within 30-minute windows
// 50k req/sec to 250k req/sec takes ~1 hour
// RIGHT: Pre-warm before expected spikes
// Or use provisioned with high capacity for planned events
// Tip: Consider AWS's "warm throughput" feature for configuring
// higher initial throughput values on new or restored tables
Pitfall 4: Missing Batch Retry Logic
// WRONG: Assume BatchWriteItem processes all items
const result = await client.send(new BatchWriteCommand({ ... }));
// Some items may have failed!
// RIGHT: Always check and retry unprocessed items
if (result.UnprocessedItems &&
Object.keys(result.UnprocessedItems).length > 0) {
// Implement exponential backoff retry
}
Pitfall 5: Not Monitoring Per-Partition Metrics
// WRONG: Only monitor table-level capacity
// "Table has 500 WCU available, why throttling?"
// RIGHT: Enable Contributor Insights
// Reveals: One partition key consuming its 1,000 WCU limit
// Table-level headroom doesn't help partition-level throttling
Key Takeaways
- Design Partition Keys First: Hot partitions cause 90% of throttling issues
- Understand Per-Partition Limits: 3,000 RCU / 1,000 WCU per partition is the real constraint
- Write Sharding Works: 10 shards = 10x write throughput for same access pattern
- Adaptive Capacity Has Limits: Rebalancing is instant, but split-for-heat takes minutes; neither helps single hot keys
- On-Demand Has Limits: 2x scaling within 30 minutes, not unlimited
- GSI Throttling Blocks Writes: Capacity matching is essential
- DAX Needs High Hit Rate: Below 80% cache hit rate, ROI is negative
- Monitor Contributor Insights: Only way to identify hot keys in Single Table Design
- Retry Unprocessed Items: SDK does not auto-retry batch operation failures
- Pre-warm for Events: Both provisioned and on-demand need preparation for traffic spikes
Building throttle-resistant DynamoDB applications requires understanding these mechanics and implementing appropriate patterns at each layer. Start with partition key design, add sharding where needed, implement proper retries, and monitor aggressively. The result is a system that scales predictably without unexpected throttling incidents.
Related posts
A comprehensive guide to implementing caching strategies across multiple tiers, from in-memory application caches to distributed Redis clusters and CDN edge caching. Learn when to use cache-aside vs write-through patterns, how to choose between ElastiCache and MemoryDB, and how to prevent cache stampede in production.
A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.
Master DynamoDB single-table design with practical patterns for modeling relationships, choosing between GSI and LSI, optimizing with DAX, and avoiding common pitfalls in production NoSQL systems.
A comprehensive foundational guide to key-value storage that answers four fundamental questions: What is KV storage? Where is it used? Why choose KV storage? Which tech stacks include which solutions?
Multi-environment deployment strategies, performance optimization at scale, and cost management. Production insights and lessons learned with proper monitoring and incident response patterns.