2025-09-04
API Versioning with AWS CDK: A Production Case Study
A technical case study on implementing multi-version APIs in production. Failed approaches, working solutions, and CDK patterns for managing API evolution.
Abstract
This case study examines the implementation of a production API versioning system using AWS CDK. Through analysis of three failed approaches and one working solution, we explore practical patterns for managing API evolution while maintaining client compatibility. The approach we ultimately developed provides solid patterns for managing multiple API versions with minimal operational overhead.
Problem Statement
API evolution creates an inevitable conflict: the need to improve and change the API while maintaining backward compatibility for existing clients. The challenge intensifies in enterprise environments where clients have varying update capabilities and deployment windows.
The specific challenge addressed here involved:
- Multiple enterprise clients with different integration capabilities
- Varying deployment cycles (from weekly to 18-month government cycles)
- Need for API improvements without breaking existing integrations
- Limited development resources for maintaining multiple versions
Failed Approaches
Three approaches were attempted before arriving at the working solution, each failing for different technical and operational reasons.
Failed Approach #1: No Versioning Strategy
The initial approach assumed all clients could be updated simultaneously, eliminating the need for versioning.
Implementation: Single API endpoint with continuous updates Timeline: 6 months from launch to failure Client Growth: 5 initial clients → 50 clients
Failure Points:
- Government client with air-gapped networks required 18-month update cycles
- Manual backporting of security fixes became unsustainable
- Shadow API maintenance created significant infrastructure complexity
- Development velocity decreased as every change required compatibility analysis
Failed Approach #2: Over-Versioning
The second approach attempted to version every aspect of the API independently.
Implementation: Separate versioning for endpoints, headers, and response formats
GET /v2/users?response_version=1.3
X-API-Version: 2.1
Accept: application/vnd.company.user.v4+json
Failure Points:
- 25+ version combinations created exponential testing complexity
- Developer cognitive load became unsustainable
- Client integration difficulty increased significantly
- Documentation maintenance became impossible
Failed Approach #3: Intelligent Routing
The third approach used client fingerprinting to automatically route requests to appropriate API versions.
Implementation: Lambda@Edge function with client detection logic Performance Impact: +150ms latency per request
Failure Points:
- Single point of failure affected all API versions
- Client detection logic proved unreliable
- Performance degradation unacceptable for production use
- High operational complexity for minimal benefit
Working Solution: Path-Based Versioning with Lifecycle Management
The successful approach combines path-based versioning with comprehensive lifecycle management and automated deprecation warnings.
// lib/config/api-versions.ts
export interface ApiVersion {
version: string;
status: 'alpha' | 'beta' | 'stable' | 'deprecated' | 'sunset';
launchedAt: Date;
deprecatedAt?: Date;
sunsetAt?: Date;
monthlyActiveClients?: number; // Track this!
breakingChanges: string[];
supportedFeatures: Set<string>;
}
export const API_VERSIONS: Record<string, ApiVersion> = {
v1: {
version: 'v1',
status: 'deprecated',
launchedAt: new Date('2022-01-15'),
deprecatedAt: new Date('2024-01-15'),
sunsetAt: new Date('2025-01-15'),
monthlyActiveClients: 28, // Legacy government clients
breakingChanges: [],
supportedFeatures: new Set(['basic-crud']),
},
v2: {
version: 'v2',
status: 'stable',
launchedAt: new Date('2023-06-01'),
monthlyActiveClients: 156,
breakingChanges: [
'Changed userId to user_id in all responses',
'Removed XML support',
'Made email field required',
],
supportedFeatures: new Set(['basic-crud', 'pagination', 'filtering']),
},
v3: {
version: 'v3',
status: 'beta',
launchedAt: new Date('2024-03-01'),
monthlyActiveClients: 42,
breakingChanges: [
'Moved to JSON:API spec',
'Changed all IDs to UUIDs',
'Nested resources under data property',
],
supportedFeatures: new Set([
'basic-crud',
'pagination',
'filtering',
'webhooks',
'graphql',
'batch-operations'
]),
},
};
The CDK Stack That Powers Our APIs
The production CDK implementation handles substantial traffic across multiple API versions:
// lib/stacks/versioned-api-stack.ts
import { RestApi, MethodLoggingLevel, LambdaIntegration } from 'aws-cdk-lib/aws-apigateway';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { Duration, Stack, StackProps } from 'aws-cdk-lib';
import { Alarm, Metric } from 'aws-cdk-lib/aws-cloudwatch';
import { Construct } from 'constructs';
export class VersionedApiStack extends Stack {
constructor(scope: Construct, id: string, props: StackProps) {
super(scope, id, props);
const api = new RestApi(this, 'MultiVersionAPI', {
restApiName: 'production-api',
// Learned this the hard way: always enable CloudWatch
deployOptions: {
loggingLevel: MethodLoggingLevel.INFO,
dataTraceEnabled: true, // Essential for debugging version-specific issues
metricsEnabled: true,
tracingEnabled: true,
},
});
// Add the version check Lambda - this is crucial
const versionCheckFn = new NodejsFunction(this, 'VersionCheck', {
entry: 'src/middleware/version-check.ts',
memorySize: 256, // Don't need much
timeout: Duration.seconds(3),
environment: {
VERSIONS: JSON.stringify(API_VERSIONS),
SLACK_WEBHOOK: process.env.SLACK_WEBHOOK!, // Alert on deprecated version usage
},
});
// Set up each version
Object.entries(API_VERSIONS).forEach(([version, config]) => {
if (config.status === 'sunset') return; // Don't deploy sunset versions
const versionResource = api.root.addResource(version);
this.setupVersionEndpoints(versionResource, config);
});
// Critical: version discovery endpoint
this.addVersionDiscovery(api);
// The alarm that saved us during the v1 sunset
new Alarm(this, 'DeprecatedVersionHighUsage', {
metric: new Metric({
namespace: 'API/Versions',
metricName: 'DeprecatedVersionCalls',
statistic: 'Sum',
}),
threshold: 1000,
evaluationPeriods: 1,
});
}
private setupVersionEndpoints(resource: IResource, config: ApiVersion) {
// Architecture: 24 Lambda functions across versions
// Separate functions per version ensure isolation
const handlers = new Map<string, Function>();
// User endpoints - the source of most breaking changes
const usersResource = resource.addResource('users');
const listUsersHandler = new NodejsFunction(this, `ListUsers-${config.version}`, {
entry: `src/handlers/${config.version}/users/list.ts`,
memorySize: config.version === 'v1' ? 512 : 1024, // V1 is inefficient
timeout: Duration.seconds(29), // API Gateway maximum timeout (up to 29 seconds for REST APIs)
environment: {
TABLE_NAME: process.env.USERS_TABLE!,
VERSION: config.version,
FEATURES: [...config.supportedFeatures].join(','),
// This saved debugging time countless times
DEPLOYMENT_TIME: new Date().toISOString(),
},
bundling: {
// Version-specific dependencies
externalModules: [
'@aws-sdk/client-dynamodb', // AWS SDK v3 for Node.js 18+ runtime
'@aws-sdk/client-cloudwatch',
...(config.version === 'v1' ? ['xmlbuilder'] : []), // V1 XML support
],
},
});
usersResource.addMethod('GET', new LambdaIntegration(listUsersHandler), {
requestParameters: {
'method.request.querystring.page': config.supportedFeatures.has('pagination'),
'method.request.querystring.limit': config.supportedFeatures.has('pagination'),
'method.request.querystring.filter': config.supportedFeatures.has('filtering'),
// V3 specific parameters
'method.request.querystring.include': config.version === 'v3',
'method.request.querystring.fields': config.version === 'v3',
},
});
// Track every version call - this metric is gold
listUsersHandler.metricInvocations().createAlarm(this, `HighTraffic-${config.version}`, {
threshold: 10000,
evaluationPeriods: 1,
alarmDescription: `High traffic on ${config.version} - check scaling`,
});
}
}
The Version Handlers That Actually Run
Here’s the real code with all its warts:
// src/handlers/v1/users/list.ts
// Legacy v1 implementation with minimal changes
export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
console.log('V1 handler called', {
path: event.path,
clientIp: event.requestContext.identity.sourceIp,
userAgent: event.headers['User-Agent'],
});
try {
// V1 doesn't support pagination, returns everything
// V1 design limitation - maintained for compatibility
const users = await getAllUsers(); // Returns all users - pagination added in v2
// The field that caused the incident
const transformedUsers = users.map(u => ({
userId: u.user_id, // V1 uses camelCase
userName: u.name,
userEmail: u.email,
createdDate: u.created_at, // Different field name because reasons
}));
return {
statusCode: 200,
headers: {
'Content-Type': 'application/json',
'X-API-Version': 'v1',
'X-API-Deprecated': 'true',
'X-API-Sunset': '2025-01-15',
'Warning': '299 - "API v1 is deprecated. Please migrate to v2. Guides: https://docs.api.com/migration"',
// Required by financial industry clients
'X-Total-Count': transformedUsers.length.toString(),
},
body: JSON.stringify(transformedUsers),
};
} catch (error) {
// Comprehensive error logging for troubleshooting
console.error('V1 handler error', {
error,
stack: error.stack,
event: JSON.stringify(event),
});
return {
statusCode: 500,
body: JSON.stringify({
error: 'Internal Server Error',
// V1 clients expect this exact format
errorCode: 'INTERNAL_ERROR',
timestamp: new Date().toISOString(),
}),
};
}
};
// src/handlers/v2/users/list.ts
export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
// V2 added proper pagination after the 50K incident
const page = parseInt(event.queryStringParameters?.page || '1');
const limit = Math.min(
parseInt(event.queryStringParameters?.limit || '20'),
100 // Maximum page size for performance
);
const metrics = {
version: 'v2',
page,
limit,
clientIp: event.requestContext.identity.sourceIp,
};
// Track deprecated version usage
if (event.headers['User-Agent']?.includes('OldSDK/1.')) {
await cloudwatch.putMetricData({
Namespace: 'API/Clients',
MetricData: [{
MetricName: 'OutdatedSDKUsage',
Value: 1,
Dimensions: [{ Name: 'Version', Value: 'v2' }],
}],
}).promise();
}
try {
const { users, total } = await getUsersPaginated({ page, limit });
// V2 response format with pagination
const response = {
data: users.map(u => ({
id: u.user_id, // Changed from userId
name: u.name,
email: u.email,
status: u.status || 'active', // New required field
created_at: u.created_at, // Snake case everywhere
updated_at: u.updated_at,
})),
pagination: {
page,
limit,
total,
total_pages: Math.ceil(total / limit),
has_next: page < Math.ceil(total / limit),
has_prev: page > 1,
},
// HATEOAS links for client navigation
_links: {
self: `/v2/users?page=${page}&limit=${limit}`,
next: page < Math.ceil(total / limit) ? `/v2/users?page=${page + 1}&limit=${limit}` : null,
prev: page > 1 ? `/v2/users?page=${page - 1}&limit=${limit}` : null,
},
};
return {
statusCode: 200,
headers: {
'Content-Type': 'application/json',
'X-API-Version': 'v2',
'X-RateLimit-Limit': '500',
'X-RateLimit-Remaining': await getRateLimitRemaining(event),
'Cache-Control': 'private, max-age=60', // Prevent unintended caching
},
body: JSON.stringify(response),
};
} catch (error) {
logger.error('V2 handler error', { error, metrics });
throw error; // Let API Gateway handle it
}
};
// src/handlers/v3/users/list.ts
// V3: JSON:API specification implementation
export const handler = middy(async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
// JSON:API compliance for enterprise integration
const params = parseJsonApiParams(event.queryStringParameters);
// Feature flags for gradual rollout
const features = await getFeatureFlags('v3', event.headers['X-Client-Id']);
const { users, total, included } = await getUsersWithRelationships({
...params,
includeRelationships: params.include,
sparseFields: params.fields,
experimentalFeatures: features,
});
// JSON:API format - love it or hate it
const response = {
data: users.map(u => ({
type: 'users',
id: u.id, // UUID format for consistency
attributes: {
name: u.name,
email: u.email,
status: u.status,
created_at: u.created_at,
updated_at: u.updated_at,
},
relationships: {
organization: {
data: { type: 'organizations', id: u.organization_id },
},
roles: {
data: u.role_ids.map(id => ({ type: 'roles', id })),
},
},
links: {
self: `/v3/users/${u.id}`,
},
})),
included: included, // Related resources
meta: {
pagination: {
page: params.page.number,
pages: Math.ceil(total / params.page.size),
count: users.length,
total: total,
},
api_version: 'v3',
generated_at: new Date().toISOString(),
experimental_features: [...features],
},
links: generateJsonApiLinks(params, total),
};
return {
statusCode: 200,
headers: {
'Content-Type': 'application/vnd.api+json', // JSON:API requirement
'X-API-Version': 'v3',
'X-RateLimit-Limit': '1000',
'X-RateLimit-Remaining': await getRateLimitRemaining(event),
'Vary': 'Accept, X-Client-Id', // Important for caching
},
body: JSON.stringify(response),
};
})
.use(jsonBodyParser())
.use(httpErrorHandler())
.use(correlationIds())
.use(logTimeout())
.use(warmup());
Migration Pain Points and Solutions
The Database Migration That Almost Killed Us
When moving from V1 to V2, we needed to change userId (string) to user_id (UUID). Here’s how we did it without downtime:
// migrations/v1-to-v2-user-ids.ts
export const migrateUserIds = async () => {
const BATCH_SIZE = 100;
let lastEvaluatedKey: any = undefined;
let migrated = 0;
let failed = 0;
// First pass: Add new field
do {
const { Items, LastEvaluatedKey } = await dynamodb.scan({
TableName: process.env.USERS_TABLE!,
Limit: BATCH_SIZE,
ExclusiveStartKey: lastEvaluatedKey,
}).promise();
const batch = Items?.map(item => ({
PutRequest: {
Item: {
...item,
user_id: item.userId || generateUUID(), // New field
_migration: 'v1-to-v2-phase1',
_migrated_at: new Date().toISOString(),
},
},
})) || [];
if (batch.length > 0) {
try {
await dynamodb.batchWrite({
RequestItems: { [process.env.USERS_TABLE!]: batch },
}).promise();
migrated += batch.length;
} catch (error) {
// Log but don't stop - we'll retry failed items
console.error('Batch failed', { error, batch: batch.map(b => b.PutRequest.Item.userId) });
failed += batch.length;
}
}
lastEvaluatedKey = LastEvaluatedKey;
// Throttle to avoid hot partitions
await new Promise(resolve => setTimeout(resolve, 100));
} while (lastEvaluatedKey);
console.log(`Migration complete: ${migrated} succeeded, ${failed} failed`);
// Second pass: Remove old field (after all clients updated)
// We waited 6 months for this
};
Client SDK Backwards Compatibility
Our SDK had to work with all API versions. This is messy but necessary:
// sdk/src/client.ts
export class ApiClient {
private version: string;
private warned = new Set<string>();
constructor(options: ClientOptions = {}) {
this.version = options.version || 'v2'; // Default to stable
if (this.version === 'v1' && !this.warned.has('deprecation')) {
console.warn(
'\x1b[33m%s\x1b[0m', // Yellow text
'[DEPRECATION] API v1 will be sunset on 2025-01-15. ' +
'Migration guide: https://docs.api.com/migration'
);
this.warned.add('deprecation');
// Track SDK version usage
this.trackEvent('sdk_deprecation_warning', { version: 'v1' });
}
}
async getUsers(options?: GetUsersOptions) {
const url = this.buildUrl('users', options);
const response = await this.request(url);
// Normalize responses across versions
return this.normalizeUserResponse(response);
}
private normalizeUserResponse(response: any): User[] {
switch (this.version) {
case 'v1':
// V1 returns flat array
return response.map((u: any) => ({
id: u.userId,
name: u.userName,
email: u.userEmail,
createdAt: new Date(u.createdDate),
// V1 doesn't have these
status: 'active',
updatedAt: new Date(u.createdDate),
}));
case 'v2':
// V2 returns paginated response
return response.data.map((u: any) => ({
id: u.id,
name: u.name,
email: u.email,
status: u.status,
createdAt: new Date(u.created_at),
updatedAt: new Date(u.updated_at),
}));
case 'v3':
// V3 returns JSON:API format
return response.data.map((u: any) => ({
id: u.id,
name: u.attributes.name,
email: u.attributes.email,
status: u.attributes.status,
createdAt: new Date(u.attributes.created_at),
updatedAt: new Date(u.attributes.updated_at),
// V3 includes relationships
organizationId: u.relationships?.organization?.data?.id,
roleIds: u.relationships?.roles?.data?.map((r: any) => r.id) || [],
}));
default:
throw new Error(`Unknown API version: ${this.version}`);
}
}
}
Monitoring and Alerting That Actually Helps
The monitoring system provides visibility into version usage patterns and performance:
// lib/constructs/api-monitoring.ts
export class ApiMonitoring extends Construct {
constructor(scope: Construct, id: string) {
super(scope, id);
// Dashboard that actually gets looked at
const dashboard = new Dashboard(this, 'ApiDashboard', {
dashboardName: 'api-versions-prod',
defaultInterval: Duration.hours(3), // Recent enough to be useful
});
// Version distribution - watched this like a hawk during v2 rollout
dashboard.addWidgets(
new GraphWidget({
title: 'API Version Distribution (% of requests)',
left: [v1Percentage, v2Percentage, v3Percentage],
leftYAxis: { max: 100, min: 0 },
period: Duration.minutes(5),
statistic: 'Average',
// Minimum usage threshold for sunset decisions
leftAnnotations: [{
label: 'Min safe threshold',
value: 5,
color: Color.RED,
}],
})
);
// The metric that matters: client errors by version
dashboard.addWidgets(
new GraphWidget({
title: '4xx Errors by Version',
left: [
new MathExpression({
expression: 'RATE(m1)',
usingMetrics: {
m1: v1Errors,
},
label: 'V1 Error Rate',
color: Color.RED,
}),
// Similar for v2, v3
],
})
);
// Deprecation warning effectiveness
const deprecationAlarm = new Alarm(this, 'V1StillHighUsage', {
metric: v1Percentage,
threshold: 10,
evaluationPeriods: 3,
comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
alarmDescription: 'V1 still above 10% - delay sunset?',
treatMissingData: TreatMissingData.NOT_BREACHING,
});
deprecationAlarm.addAlarmAction(
new SnsAction(Topic.fromTopicArn(this, 'AlertTopic', process.env.ALERT_TOPIC_ARN!))
);
}
}
Lessons Learned
1. Version Sunset Complexity
28 clients remain on V1 after two years of deprecation due to:
- Government deployment cycles requiring 18-month lead times
- IoT devices with firmware-embedded URLs
- Legacy systems with hard-coded integrations
V1 maintenance requires ongoing technical resources while supporting clients with critical integration dependencies
2. Exponential Testing Complexity
Breaking changes multiply testing requirements exponentially:
- 3 API versions
- 3 SDK versions
- 4 response formats
- = 36 test combinations
Integration test suite: 25 minutes execution time
3. Documentation Maintenance
Documentation drift creates hidden dependencies. V1 documentation lag led to:
- Client reliance on undocumented behavior
- Need for feature flags to maintain compatibility
- Additional development overhead for legacy behavior
4. Version Discovery Is Critical
// This endpoint saves more support tickets than any other
app.get('/api', (req, res) => {
res.json({
versions: {
v1: {
status: 'deprecated',
sunset_date: '2025-01-15',
docs: 'https://docs.api.com/v1',
migration_guide: 'https://docs.api.com/v1-to-v2',
},
v2: {
status: 'stable',
docs: 'https://docs.api.com/v2',
},
v3: {
status: 'beta',
docs: 'https://docs.api.com/v3',
breaking_changes: 'https://docs.api.com/v3-breaking-changes',
},
},
current_stable: 'v2',
recommended: 'v2',
your_version: detectVersion(req), // What the client is using
});
});
Operational Considerations
Multi-version API maintenance requires significant technical considerations:
- Infrastructure: 3x Lambda functions, API Gateway configurations create operational complexity
- Development: 35% longer implementation time for cross-version features
- Testing: CI/CD pipeline extended from 8 minutes to 25 minutes due to comprehensive version coverage
- Documentation: Dedicated resources needed for version-specific documentation
- Support: 25% of tickets related to version confusion requiring clear migration guides
Implementation Recommendations
- Design for versioning from initial release - Retrofitting versioning increases complexity 8-10x
- Bundle breaking changes - Batch related changes to reduce version proliferation
- Automate migration tooling - Build client migration tools before they’re needed
- Plan realistic sunset timelines - Enterprise clients require 12-18 month migration windows
- Implement usage tracking early - Version analytics inform sunset decisions
The CDK Pattern That Actually Works
If you’re starting fresh, use this structure:
/api
/v1
/users
/orders
/internal/health
/v2
/users
/orders
/internal/health
/versions (discovery endpoint)
/health (version-agnostic)
Keep your Lambda code organized by version:
/src
/handlers
/v1
/users
/orders
/v2
/users
/orders
/shared
/database
/auth
/utils
Conclusion
Successful API versioning balances technical elegance with business reality. The path-based versioning approach with lifecycle management provides:
- Client Compatibility: Maintains service for diverse client update cycles
- Development Efficiency: Clear separation of version-specific logic
- Operational Visibility: Comprehensive monitoring and deprecation warnings
- Business Continuity: Revenue protection during API evolution
Implementing production-ready API versioning requires 4-6 months initial investment and ongoing operational complexity, but provides essential client compatibility during API evolution and protects critical business relationships.
Related posts
A CDK guide for deploying a minimal Strands agent on AgentCore Runtime — parameterized stack, arm64 build, deploy and invoke, and the IAM and Marketplace prerequisites you need before the first call.
A comprehensive technical guide to Amazon Cognito's advanced features including custom authentication flows, federation patterns, multi-tenancy architectures, migration strategies, and production-grade security implementation.
A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.
Learn how to implement secure cross-account event distribution using Amazon SNS and SQS. Covers IAM policies, KMS encryption, AWS CDK implementation, and common pitfalls from real-world deployments.
Master AWS Step Functions for production-ready serverless workflows. Learn Standard vs Express workflows, Distributed Map processing, error handling patterns, callback integration, and cost optimization strategies with working CDK examples.