Skip to content

2025-09-04

API Versioning with AWS CDK: A Production Case Study

A technical case study on implementing multi-version APIs in production. Failed approaches, working solutions, and CDK patterns for managing API evolution.

Abstract

This case study examines the implementation of a production API versioning system using AWS CDK. Through analysis of three failed approaches and one working solution, we explore practical patterns for managing API evolution while maintaining client compatibility. The approach we ultimately developed provides solid patterns for managing multiple API versions with minimal operational overhead.

Problem Statement

API evolution creates an inevitable conflict: the need to improve and change the API while maintaining backward compatibility for existing clients. The challenge intensifies in enterprise environments where clients have varying update capabilities and deployment windows.

The specific challenge addressed here involved:

  • Multiple enterprise clients with different integration capabilities
  • Varying deployment cycles (from weekly to 18-month government cycles)
  • Need for API improvements without breaking existing integrations
  • Limited development resources for maintaining multiple versions

Failed Approaches

Three approaches were attempted before arriving at the working solution, each failing for different technical and operational reasons.

Failed Approach #1: No Versioning Strategy

The initial approach assumed all clients could be updated simultaneously, eliminating the need for versioning.

Implementation: Single API endpoint with continuous updates Timeline: 6 months from launch to failure Client Growth: 5 initial clients → 50 clients

Failure Points:

  • Government client with air-gapped networks required 18-month update cycles
  • Manual backporting of security fixes became unsustainable
  • Shadow API maintenance created significant infrastructure complexity
  • Development velocity decreased as every change required compatibility analysis

Failed Approach #2: Over-Versioning

The second approach attempted to version every aspect of the API independently.

Implementation: Separate versioning for endpoints, headers, and response formats

GET /v2/users?response_version=1.3
X-API-Version: 2.1
Accept: application/vnd.company.user.v4+json

Failure Points:

  • 25+ version combinations created exponential testing complexity
  • Developer cognitive load became unsustainable
  • Client integration difficulty increased significantly
  • Documentation maintenance became impossible

Failed Approach #3: Intelligent Routing

The third approach used client fingerprinting to automatically route requests to appropriate API versions.

Implementation: Lambda@Edge function with client detection logic Performance Impact: +150ms latency per request

Failure Points:

  • Single point of failure affected all API versions
  • Client detection logic proved unreliable
  • Performance degradation unacceptable for production use
  • High operational complexity for minimal benefit

Working Solution: Path-Based Versioning with Lifecycle Management

The successful approach combines path-based versioning with comprehensive lifecycle management and automated deprecation warnings.

// lib/config/api-versions.ts
export interface ApiVersion {
  version: string;
  status: 'alpha' | 'beta' | 'stable' | 'deprecated' | 'sunset';
  launchedAt: Date;
  deprecatedAt?: Date;
  sunsetAt?: Date;
  monthlyActiveClients?: number;  // Track this!
  breakingChanges: string[];
  supportedFeatures: Set<string>;
}

export const API_VERSIONS: Record<string, ApiVersion> = {
  v1: {
    version: 'v1',
    status: 'deprecated',
    launchedAt: new Date('2022-01-15'),
    deprecatedAt: new Date('2024-01-15'),
    sunsetAt: new Date('2025-01-15'),
    monthlyActiveClients: 28,  // Legacy government clients
    breakingChanges: [],
    supportedFeatures: new Set(['basic-crud']),
  },
  v2: {
    version: 'v2',
    status: 'stable',
    launchedAt: new Date('2023-06-01'),
    monthlyActiveClients: 156,
    breakingChanges: [
      'Changed userId to user_id in all responses',
      'Removed XML support',
      'Made email field required',
    ],
    supportedFeatures: new Set(['basic-crud', 'pagination', 'filtering']),
  },
  v3: {
    version: 'v3',
    status: 'beta',
    launchedAt: new Date('2024-03-01'),
    monthlyActiveClients: 42,
    breakingChanges: [
      'Moved to JSON:API spec',
      'Changed all IDs to UUIDs',
      'Nested resources under data property',
    ],
    supportedFeatures: new Set([
      'basic-crud',
      'pagination',
      'filtering',
      'webhooks',
      'graphql',
      'batch-operations'
    ]),
  },
};

The CDK Stack That Powers Our APIs

The production CDK implementation handles substantial traffic across multiple API versions:

// lib/stacks/versioned-api-stack.ts
import { RestApi, MethodLoggingLevel, LambdaIntegration } from 'aws-cdk-lib/aws-apigateway';
import { NodejsFunction } from 'aws-cdk-lib/aws-lambda-nodejs';
import { Duration, Stack, StackProps } from 'aws-cdk-lib';
import { Alarm, Metric } from 'aws-cdk-lib/aws-cloudwatch';
import { Construct } from 'constructs';

export class VersionedApiStack extends Stack {
  constructor(scope: Construct, id: string, props: StackProps) {
    super(scope, id, props);

    const api = new RestApi(this, 'MultiVersionAPI', {
      restApiName: 'production-api',
      // Learned this the hard way: always enable CloudWatch
      deployOptions: {
        loggingLevel: MethodLoggingLevel.INFO,
        dataTraceEnabled: true,  // Essential for debugging version-specific issues
        metricsEnabled: true,
        tracingEnabled: true,
      },
    });

    // Add the version check Lambda - this is crucial
    const versionCheckFn = new NodejsFunction(this, 'VersionCheck', {
      entry: 'src/middleware/version-check.ts',
      memorySize: 256,  // Don't need much
      timeout: Duration.seconds(3),
      environment: {
        VERSIONS: JSON.stringify(API_VERSIONS),
        SLACK_WEBHOOK: process.env.SLACK_WEBHOOK!,  // Alert on deprecated version usage
      },
    });

    // Set up each version
    Object.entries(API_VERSIONS).forEach(([version, config]) => {
      if (config.status === 'sunset') return;  // Don't deploy sunset versions

      const versionResource = api.root.addResource(version);
      this.setupVersionEndpoints(versionResource, config);
    });

    // Critical: version discovery endpoint
    this.addVersionDiscovery(api);

    // The alarm that saved us during the v1 sunset
    new Alarm(this, 'DeprecatedVersionHighUsage', {
      metric: new Metric({
        namespace: 'API/Versions',
        metricName: 'DeprecatedVersionCalls',
        statistic: 'Sum',
      }),
      threshold: 1000,
      evaluationPeriods: 1,
    });
  }

  private setupVersionEndpoints(resource: IResource, config: ApiVersion) {
    // Architecture: 24 Lambda functions across versions
    // Separate functions per version ensure isolation

    const handlers = new Map<string, Function>();

    // User endpoints - the source of most breaking changes
    const usersResource = resource.addResource('users');

    const listUsersHandler = new NodejsFunction(this, `ListUsers-${config.version}`, {
      entry: `src/handlers/${config.version}/users/list.ts`,
      memorySize: config.version === 'v1' ? 512 : 1024,  // V1 is inefficient
      timeout: Duration.seconds(29),  // API Gateway maximum timeout (up to 29 seconds for REST APIs)
      environment: {
        TABLE_NAME: process.env.USERS_TABLE!,
        VERSION: config.version,
        FEATURES: [...config.supportedFeatures].join(','),
        // This saved debugging time countless times
        DEPLOYMENT_TIME: new Date().toISOString(),
      },
      bundling: {
        // Version-specific dependencies
        externalModules: [
          '@aws-sdk/client-dynamodb',  // AWS SDK v3 for Node.js 18+ runtime
          '@aws-sdk/client-cloudwatch',
          ...(config.version === 'v1' ? ['xmlbuilder'] : []),  // V1 XML support
        ],
      },
    });

    usersResource.addMethod('GET', new LambdaIntegration(listUsersHandler), {
      requestParameters: {
        'method.request.querystring.page': config.supportedFeatures.has('pagination'),
        'method.request.querystring.limit': config.supportedFeatures.has('pagination'),
        'method.request.querystring.filter': config.supportedFeatures.has('filtering'),
        // V3 specific parameters
        'method.request.querystring.include': config.version === 'v3',
        'method.request.querystring.fields': config.version === 'v3',
      },
    });

    // Track every version call - this metric is gold
    listUsersHandler.metricInvocations().createAlarm(this, `HighTraffic-${config.version}`, {
      threshold: 10000,
      evaluationPeriods: 1,
      alarmDescription: `High traffic on ${config.version} - check scaling`,
    });
  }
}

The Version Handlers That Actually Run

Here’s the real code with all its warts:

// src/handlers/v1/users/list.ts
// Legacy v1 implementation with minimal changes
export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  console.log('V1 handler called', {
    path: event.path,
    clientIp: event.requestContext.identity.sourceIp,
    userAgent: event.headers['User-Agent'],
  });

  try {
    // V1 doesn't support pagination, returns everything
    // V1 design limitation - maintained for compatibility
    const users = await getAllUsers();  // Returns all users - pagination added in v2

    // The field that caused the incident
    const transformedUsers = users.map(u => ({
      userId: u.user_id,  // V1 uses camelCase
      userName: u.name,
      userEmail: u.email,
      createdDate: u.created_at,  // Different field name because reasons
    }));

    return {
      statusCode: 200,
      headers: {
        'Content-Type': 'application/json',
        'X-API-Version': 'v1',
        'X-API-Deprecated': 'true',
        'X-API-Sunset': '2025-01-15',
        'Warning': '299 - "API v1 is deprecated. Please migrate to v2. Guides: https://docs.api.com/migration"',
        // Required by financial industry clients
        'X-Total-Count': transformedUsers.length.toString(),
      },
      body: JSON.stringify(transformedUsers),
    };
  } catch (error) {
    // Comprehensive error logging for troubleshooting
    console.error('V1 handler error', {
      error,
      stack: error.stack,
      event: JSON.stringify(event),
    });

    return {
      statusCode: 500,
      body: JSON.stringify({
        error: 'Internal Server Error',
        // V1 clients expect this exact format
        errorCode: 'INTERNAL_ERROR',
        timestamp: new Date().toISOString(),
      }),
    };
  }
};

// src/handlers/v2/users/list.ts
export const handler = async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  // V2 added proper pagination after the 50K incident
  const page = parseInt(event.queryStringParameters?.page || '1');
  const limit = Math.min(
    parseInt(event.queryStringParameters?.limit || '20'),
    100  // Maximum page size for performance
  );

  const metrics = {
    version: 'v2',
    page,
    limit,
    clientIp: event.requestContext.identity.sourceIp,
  };

  // Track deprecated version usage
  if (event.headers['User-Agent']?.includes('OldSDK/1.')) {
    await cloudwatch.putMetricData({
      Namespace: 'API/Clients',
      MetricData: [{
        MetricName: 'OutdatedSDKUsage',
        Value: 1,
        Dimensions: [{ Name: 'Version', Value: 'v2' }],
      }],
    }).promise();
  }

  try {
    const { users, total } = await getUsersPaginated({ page, limit });

    // V2 response format with pagination
    const response = {
      data: users.map(u => ({
        id: u.user_id,  // Changed from userId
        name: u.name,
        email: u.email,
        status: u.status || 'active',  // New required field
        created_at: u.created_at,  // Snake case everywhere
        updated_at: u.updated_at,
      })),
      pagination: {
        page,
        limit,
        total,
        total_pages: Math.ceil(total / limit),
        has_next: page < Math.ceil(total / limit),
        has_prev: page > 1,
      },
      // HATEOAS links for client navigation
      _links: {
        self: `/v2/users?page=${page}&limit=${limit}`,
        next: page < Math.ceil(total / limit) ? `/v2/users?page=${page + 1}&limit=${limit}` : null,
        prev: page > 1 ? `/v2/users?page=${page - 1}&limit=${limit}` : null,
      },
    };

    return {
      statusCode: 200,
      headers: {
        'Content-Type': 'application/json',
        'X-API-Version': 'v2',
        'X-RateLimit-Limit': '500',
        'X-RateLimit-Remaining': await getRateLimitRemaining(event),
        'Cache-Control': 'private, max-age=60',  // Prevent unintended caching
      },
      body: JSON.stringify(response),
    };
  } catch (error) {
    logger.error('V2 handler error', { error, metrics });
    throw error;  // Let API Gateway handle it
  }
};

// src/handlers/v3/users/list.ts
// V3: JSON:API specification implementation
export const handler = middy(async (event: APIGatewayProxyEvent): Promise<APIGatewayProxyResult> => {
  // JSON:API compliance for enterprise integration
  const params = parseJsonApiParams(event.queryStringParameters);

  // Feature flags for gradual rollout
  const features = await getFeatureFlags('v3', event.headers['X-Client-Id']);

  const { users, total, included } = await getUsersWithRelationships({
    ...params,
    includeRelationships: params.include,
    sparseFields: params.fields,
    experimentalFeatures: features,
  });

  // JSON:API format - love it or hate it
  const response = {
    data: users.map(u => ({
      type: 'users',
      id: u.id,  // UUID format for consistency
      attributes: {
        name: u.name,
        email: u.email,
        status: u.status,
        created_at: u.created_at,
        updated_at: u.updated_at,
      },
      relationships: {
        organization: {
          data: { type: 'organizations', id: u.organization_id },
        },
        roles: {
          data: u.role_ids.map(id => ({ type: 'roles', id })),
        },
      },
      links: {
        self: `/v3/users/${u.id}`,
      },
    })),
    included: included,  // Related resources
    meta: {
      pagination: {
        page: params.page.number,
        pages: Math.ceil(total / params.page.size),
        count: users.length,
        total: total,
      },
      api_version: 'v3',
      generated_at: new Date().toISOString(),
      experimental_features: [...features],
    },
    links: generateJsonApiLinks(params, total),
  };

  return {
    statusCode: 200,
    headers: {
      'Content-Type': 'application/vnd.api+json',  // JSON:API requirement
      'X-API-Version': 'v3',
      'X-RateLimit-Limit': '1000',
      'X-RateLimit-Remaining': await getRateLimitRemaining(event),
      'Vary': 'Accept, X-Client-Id',  // Important for caching
    },
    body: JSON.stringify(response),
  };
})
  .use(jsonBodyParser())
  .use(httpErrorHandler())
  .use(correlationIds())
  .use(logTimeout())
  .use(warmup());

Migration Pain Points and Solutions

The Database Migration That Almost Killed Us

When moving from V1 to V2, we needed to change userId (string) to user_id (UUID). Here’s how we did it without downtime:

// migrations/v1-to-v2-user-ids.ts
export const migrateUserIds = async () => {
  const BATCH_SIZE = 100;
  let lastEvaluatedKey: any = undefined;
  let migrated = 0;
  let failed = 0;

  // First pass: Add new field
  do {
    const { Items, LastEvaluatedKey } = await dynamodb.scan({
      TableName: process.env.USERS_TABLE!,
      Limit: BATCH_SIZE,
      ExclusiveStartKey: lastEvaluatedKey,
    }).promise();

    const batch = Items?.map(item => ({
      PutRequest: {
        Item: {
          ...item,
          user_id: item.userId || generateUUID(),  // New field
          _migration: 'v1-to-v2-phase1',
          _migrated_at: new Date().toISOString(),
        },
      },
    })) || [];

    if (batch.length > 0) {
      try {
        await dynamodb.batchWrite({
          RequestItems: { [process.env.USERS_TABLE!]: batch },
        }).promise();
        migrated += batch.length;
      } catch (error) {
        // Log but don't stop - we'll retry failed items
        console.error('Batch failed', { error, batch: batch.map(b => b.PutRequest.Item.userId) });
        failed += batch.length;
      }
    }

    lastEvaluatedKey = LastEvaluatedKey;

    // Throttle to avoid hot partitions
    await new Promise(resolve => setTimeout(resolve, 100));

  } while (lastEvaluatedKey);

  console.log(`Migration complete: ${migrated} succeeded, ${failed} failed`);

  // Second pass: Remove old field (after all clients updated)
  // We waited 6 months for this
};

Client SDK Backwards Compatibility

Our SDK had to work with all API versions. This is messy but necessary:

// sdk/src/client.ts
export class ApiClient {
  private version: string;
  private warned = new Set<string>();

  constructor(options: ClientOptions = {}) {
    this.version = options.version || 'v2';  // Default to stable

    if (this.version === 'v1' && !this.warned.has('deprecation')) {
      console.warn(
        '\x1b[33m%s\x1b[0m',  // Yellow text
        '[DEPRECATION] API v1 will be sunset on 2025-01-15. ' +
        'Migration guide: https://docs.api.com/migration'
      );
      this.warned.add('deprecation');

      // Track SDK version usage
      this.trackEvent('sdk_deprecation_warning', { version: 'v1' });
    }
  }

  async getUsers(options?: GetUsersOptions) {
    const url = this.buildUrl('users', options);
    const response = await this.request(url);

    // Normalize responses across versions
    return this.normalizeUserResponse(response);
  }

  private normalizeUserResponse(response: any): User[] {
    switch (this.version) {
      case 'v1':
        // V1 returns flat array
        return response.map((u: any) => ({
          id: u.userId,
          name: u.userName,
          email: u.userEmail,
          createdAt: new Date(u.createdDate),
          // V1 doesn't have these
          status: 'active',
          updatedAt: new Date(u.createdDate),
        }));

      case 'v2':
        // V2 returns paginated response
        return response.data.map((u: any) => ({
          id: u.id,
          name: u.name,
          email: u.email,
          status: u.status,
          createdAt: new Date(u.created_at),
          updatedAt: new Date(u.updated_at),
        }));

      case 'v3':
        // V3 returns JSON:API format
        return response.data.map((u: any) => ({
          id: u.id,
          name: u.attributes.name,
          email: u.attributes.email,
          status: u.attributes.status,
          createdAt: new Date(u.attributes.created_at),
          updatedAt: new Date(u.attributes.updated_at),
          // V3 includes relationships
          organizationId: u.relationships?.organization?.data?.id,
          roleIds: u.relationships?.roles?.data?.map((r: any) => r.id) || [],
        }));

      default:
        throw new Error(`Unknown API version: ${this.version}`);
    }
  }
}

Monitoring and Alerting That Actually Helps

The monitoring system provides visibility into version usage patterns and performance:

// lib/constructs/api-monitoring.ts
export class ApiMonitoring extends Construct {
  constructor(scope: Construct, id: string) {
    super(scope, id);

    // Dashboard that actually gets looked at
    const dashboard = new Dashboard(this, 'ApiDashboard', {
      dashboardName: 'api-versions-prod',
      defaultInterval: Duration.hours(3),  // Recent enough to be useful
    });

    // Version distribution - watched this like a hawk during v2 rollout
    dashboard.addWidgets(
      new GraphWidget({
        title: 'API Version Distribution (% of requests)',
        left: [v1Percentage, v2Percentage, v3Percentage],
        leftYAxis: { max: 100, min: 0 },
        period: Duration.minutes(5),
        statistic: 'Average',
        // Minimum usage threshold for sunset decisions
        leftAnnotations: [{
          label: 'Min safe threshold',
          value: 5,
          color: Color.RED,
        }],
      })
    );

    // The metric that matters: client errors by version
    dashboard.addWidgets(
      new GraphWidget({
        title: '4xx Errors by Version',
        left: [
          new MathExpression({
            expression: 'RATE(m1)',
            usingMetrics: {
              m1: v1Errors,
            },
            label: 'V1 Error Rate',
            color: Color.RED,
          }),
          // Similar for v2, v3
        ],
      })
    );

    // Deprecation warning effectiveness
    const deprecationAlarm = new Alarm(this, 'V1StillHighUsage', {
      metric: v1Percentage,
      threshold: 10,
      evaluationPeriods: 3,
      comparisonOperator: ComparisonOperator.GREATER_THAN_THRESHOLD,
      alarmDescription: 'V1 still above 10% - delay sunset?',
      treatMissingData: TreatMissingData.NOT_BREACHING,
    });

    deprecationAlarm.addAlarmAction(
      new SnsAction(Topic.fromTopicArn(this, 'AlertTopic', process.env.ALERT_TOPIC_ARN!))
    );
  }
}

Lessons Learned

1. Version Sunset Complexity

28 clients remain on V1 after two years of deprecation due to:

  • Government deployment cycles requiring 18-month lead times
  • IoT devices with firmware-embedded URLs
  • Legacy systems with hard-coded integrations

V1 maintenance requires ongoing technical resources while supporting clients with critical integration dependencies

2. Exponential Testing Complexity

Breaking changes multiply testing requirements exponentially:

  • 3 API versions
  • 3 SDK versions
  • 4 response formats
  • = 36 test combinations

Integration test suite: 25 minutes execution time

3. Documentation Maintenance

Documentation drift creates hidden dependencies. V1 documentation lag led to:

  • Client reliance on undocumented behavior
  • Need for feature flags to maintain compatibility
  • Additional development overhead for legacy behavior

4. Version Discovery Is Critical

// This endpoint saves more support tickets than any other
app.get('/api', (req, res) => {
  res.json({
    versions: {
      v1: {
        status: 'deprecated',
        sunset_date: '2025-01-15',
        docs: 'https://docs.api.com/v1',
        migration_guide: 'https://docs.api.com/v1-to-v2',
      },
      v2: {
        status: 'stable',
        docs: 'https://docs.api.com/v2',
      },
      v3: {
        status: 'beta',
        docs: 'https://docs.api.com/v3',
        breaking_changes: 'https://docs.api.com/v3-breaking-changes',
      },
    },
    current_stable: 'v2',
    recommended: 'v2',
    your_version: detectVersion(req),  // What the client is using
  });
});

Operational Considerations

Multi-version API maintenance requires significant technical considerations:

  • Infrastructure: 3x Lambda functions, API Gateway configurations create operational complexity
  • Development: 35% longer implementation time for cross-version features
  • Testing: CI/CD pipeline extended from 8 minutes to 25 minutes due to comprehensive version coverage
  • Documentation: Dedicated resources needed for version-specific documentation
  • Support: 25% of tickets related to version confusion requiring clear migration guides

Implementation Recommendations

  1. Design for versioning from initial release - Retrofitting versioning increases complexity 8-10x
  2. Bundle breaking changes - Batch related changes to reduce version proliferation
  3. Automate migration tooling - Build client migration tools before they’re needed
  4. Plan realistic sunset timelines - Enterprise clients require 12-18 month migration windows
  5. Implement usage tracking early - Version analytics inform sunset decisions

The CDK Pattern That Actually Works

If you’re starting fresh, use this structure:

/api
  /v1
    /users
    /orders
    /internal/health
  /v2
    /users
    /orders
    /internal/health
  /versions (discovery endpoint)
  /health (version-agnostic)

Keep your Lambda code organized by version:

/src
  /handlers
    /v1
      /users
      /orders
    /v2
      /users
      /orders
  /shared
    /database
    /auth
    /utils

Conclusion

Successful API versioning balances technical elegance with business reality. The path-based versioning approach with lifecycle management provides:

  • Client Compatibility: Maintains service for diverse client update cycles
  • Development Efficiency: Clear separation of version-specific logic
  • Operational Visibility: Comprehensive monitoring and deprecation warnings
  • Business Continuity: Revenue protection during API evolution

Implementing production-ready API versioning requires 4-6 months initial investment and ongoing operational complexity, but provides essential client compatibility during API evolution and protects critical business relationships.

Related posts

Deploying AWS Bedrock AgentCore with CDK: a quickstart

A CDK guide for deploying a minimal Strands agent on AgentCore Runtime — parameterized stack, arm64 build, deploy and invoke, and the IAM and Marketplace prerequisites you need before the first call.

aws-bedrockai-agentsaws-cdk+3
Amazon Cognito Deep Dive: Beyond Basic Authentication

A comprehensive technical guide to Amazon Cognito's advanced features including custom authentication flows, federation patterns, multi-tenancy architectures, migration strategies, and production-grade security implementation.

awscognitoauthentication+7
AWS AppSync & GraphQL: Building Production-Ready Real-time APIs

A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.

awsappsyncgraphql+5
SNS/SQS Cross-Account Fan-Out: Building Multi-Account Event Distribution in AWS

Learn how to implement secure cross-account event distribution using Amazon SNS and SQS. Covers IAM policies, KMS encryption, AWS CDK implementation, and common pitfalls from real-world deployments.

awsaws-snsaws-sqs+6
AWS Step Functions Deep Dive: Building Resilient Workflow Orchestration

Master AWS Step Functions for production-ready serverless workflows. Learn Standard vs Express workflows, Distributed Map processing, error handling patterns, callback integration, and cost optimization strategies with working CDK examples.

aws-step-functionsaws-cdkserverless+4