2025-09-21
Building Ephemeral Preview Environments with AWS CDK and Serverless
Learn to build automated preview environments using AWS CDK, Lambda, and GitHub Actions for seamless PR testing and review workflows
The Problem with Shared Staging Environments
A single shared staging environment becomes a bottleneck once a team opens more than one pull request per day against it. Concurrent PRs overwrite each other’s database state, flap on the same feature flags, and compete for the same DNS record; test signal drops because a failure is as likely to mean “someone else’s PR” as a real regression.
Ephemeral preview environments (one per PR) remove the resource contention but introduce their own constraints: cost per environment, cold-start friction for reviewers, and cleanup reliability. This post covers the design of per-PR preview environments on AWS using the CDK. It covers stack-per-PR CDK composition, preview URL routing, automatic teardown on PR close, database seeding strategies, and the cost controls that keep the pattern sustainable at team scale.
Architecture Overview
The solution combines AWS serverless services with GitHub Actions to create fully automated preview environments. Each PR gets its own subdomain and infrastructure stack that mirrors production but scales down appropriately.
Core Implementation
CDK Stack Architecture
The foundation is a parameterized CDK stack that creates identical infrastructure for each PR:
// lib/preview-environment-stack.ts
import { Stack, StackProps, Duration, RemovalPolicy } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as lambda from 'aws-cdk-lib/aws-lambda';
import * as apigateway from 'aws-cdk-lib/aws-apigateway';
import * as route53 from 'aws-cdk-lib/aws-route53';
import * as cloudfront from 'aws-cdk-lib/aws-cloudfront';
export interface PreviewStackProps extends StackProps {
prNumber: string;
commitSha: string;
domain: string;
certificateArn: string;
}
export class PreviewEnvironmentStack extends Stack {
constructor(scope: Construct, id: string, props: PreviewStackProps) {
super(scope, id, props);
const { prNumber, commitSha, domain } = props;
// Common tags for resource management
const commonTags = {
Environment: 'preview',
PRNumber: prNumber,
CommitSha: commitSha.substring(0, 8),
CreatedBy: 'github-actions',
TTL: this.calculateTTL(72), // 72 hours
};
// Lambda function for the application
const appFunction = new lambda.Function(this, 'AppFunction', {
runtime: lambda.Runtime.NODEJS_22_X,
handler: 'index.handler',
code: lambda.Code.fromAsset('dist'),
timeout: Duration.seconds(30),
memorySize: 256,
environment: {
STAGE: 'preview',
PR_NUMBER: prNumber,
COMMIT_SHA: commitSha,
},
});
// API Gateway with custom domain
const api = new apigateway.RestApi(this, 'PreviewApi', {
restApiName: `preview-api-pr-${prNumber}`,
description: `Preview environment for PR ${prNumber}`,
binaryMediaTypes: ['*/*'],
defaultCorsPreflightOptions: {
allowOrigins: apigateway.Cors.ALL_ORIGINS,
allowMethods: apigateway.Cors.ALL_METHODS,
},
});
// Lambda integration
const lambdaIntegration = new apigateway.LambdaIntegration(appFunction, {
requestTemplates: { 'application/json': '{ "statusCode": "200" }' },
});
api.root.addMethod('ANY', lambdaIntegration);
api.root.addProxy({
defaultIntegration: lambdaIntegration,
});
// Route53 record for custom domain
const previewDomain = `pr-${prNumber}.${domain}`;
const hostedZone = route53.HostedZone.fromLookup(this, 'HostedZone', {
domainName: domain,
});
// CloudFront distribution for caching
const distribution = new cloudfront.CloudFrontWebDistribution(this, 'Distribution', {
originConfigs: [{
customOriginSource: {
domainName: api.restApiId + '.execute-api.' + this.region + '.amazonaws.com',
originPath: '/prod',
},
behaviors: [{ isDefaultBehavior: true }],
}],
comment: `Preview distribution for PR ${prNumber}`,
});
// Apply common tags to all resources
Object.entries(commonTags).forEach(([key, value]) => {
this.node.applyAspect(new TagAspect(key, value));
});
// Apply cleanup aspect
this.node.applyAspect(new AutoCleanupAspect());
}
private calculateTTL(hours: number): string {
const expiryDate = new Date(Date.now() + hours * 60 * 60 * 1000);
return expiryDate.toISOString();
}
}
// Cleanup aspect for proper resource deletion
import * as cdk from 'aws-cdk-lib';
class AutoCleanupAspect implements cdk.IAspect {
visit(node: cdk.IConstruct): void {
if (node instanceof cdk.CfnResource) {
node.addPropertyOverride('DeletionPolicy', 'Delete');
}
}
}
// Tagging aspect for cost tracking
class TagAspect implements cdk.IAspect {
constructor(private key: string, private value: string) {}
visit(node: cdk.IConstruct): void {
if (cdk.TagManager.isTaggable(node)) {
node.tags.setTag(this.key, this.value);
}
}
}
GitHub Actions Workflow
The automation starts with a GitHub Actions workflow that responds to PR events:
# .github/workflows/preview-environment.yml
name: Preview Environment
on:
pull_request:
types: [opened, synchronize, closed]
permissions:
id-token: write
contents: read
pull-requests: write
jobs:
deploy-preview:
if: github.event.action != 'closed'
runs-on: ubuntu-latest
env:
PR_NUMBER: ${{ github.event.number }}
COMMIT_SHA: ${{ github.event.pull_request.head.sha }}
PREVIEW_DOMAIN: pr-${{ github.event.number }}.preview.company.com
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '22'
cache: 'npm'
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Install dependencies
run: |
npm ci
npm run build
- name: Deploy CDK stack
run: |
npx cdk deploy preview-pr-${{ env.PR_NUMBER }} \
--parameters prNumber=${{ env.PR_NUMBER }} \
--parameters commitSha=${{ env.COMMIT_SHA }} \
--require-approval never \
--outputs-file cdk-outputs.json
- name: Extract deployment URL
id: extract-url
run: |
PREVIEW_URL=$(jq -r '.["preview-pr-${{ env.PR_NUMBER }}"].PreviewURL' cdk-outputs.json)
echo "url=$PREVIEW_URL" >> $GITHUB_OUTPUT
- name: Wait for deployment readiness
run: |
for i in {1..30}; do
if curl -f -s "${{ steps.extract-url.outputs.url }}/health" > /dev/null; then
echo "Deployment is ready!"
break
fi
echo "Waiting for deployment... ($i/30)"
sleep 10
done
- name: Run E2E tests
env:
CYPRESS_BASE_URL: ${{ steps.extract-url.outputs.url }}
run: |
npm run test:e2e
- name: Update PR comment
uses: actions/github-script@v7
with:
script: |
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
});
const botComment = comments.find(comment =>
comment.user.type === 'Bot' &&
comment.body.includes('Preview Environment')
);
const body = `## Preview Environment
**URL:** ${{ steps.extract-url.outputs.url }}
**Status:** Ready
**Commit:** \`${{ env.COMMIT_SHA }}\`
E2E tests: Passed
`;
if (botComment) {
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: botComment.id,
body: body
});
} else {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: body
});
}
cleanup-preview:
if: github.event.action == 'closed'
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: ${{ secrets.AWS_ROLE_ARN }}
aws-region: us-east-1
- name: Destroy CDK stack
run: |
npx cdk destroy preview-pr-${{ github.event.number }} --force
- name: Update PR comment
uses: actions/github-script@v7
with:
script: |
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
});
const botComment = comments.find(comment =>
comment.user.type === 'Bot' &&
comment.body.includes('Preview Environment')
);
if (botComment) {
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: botComment.id,
body: botComment.body + '\n\n**Status:** Cleaned up'
});
}
OIDC Authentication Setup
Instead of storing long-lived AWS credentials, use GitHub’s OIDC provider for secure, temporary access:
// iam/github-oidc-role.ts
import { Stack, StackProps } from 'aws-cdk-lib';
import { Construct } from 'constructs';
import * as iam from 'aws-cdk-lib/aws-iam';
export class GitHubOIDCStack extends Stack {
constructor(scope: Construct, id: string, props?: StackProps) {
super(scope, id, props);
// Create OIDC provider for GitHub Actions
const provider = new iam.OpenIdConnectProvider(this, 'GitHubProvider', {
url: 'https://token.actions.githubusercontent.com',
clientIds: ['sts.amazonaws.com'],
thumbprints: ['6938fd4d98bab03faadb97b34396831e3780aea1'],
});
// IAM role for GitHub Actions
const role = new iam.Role(this, 'GitHubActionsRole', {
assumedBy: new iam.WebIdentityPrincipal(
provider.openIdConnectProviderArn,
{
StringEquals: {
'token.actions.githubusercontent.com:aud': 'sts.amazonaws.com',
},
StringLike: {
'token.actions.githubusercontent.com:sub': 'repo:your-org/your-repo:*',
},
}
),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('PowerUserAccess'),
],
});
// Additional policies for CDK operations
role.addToPolicy(new iam.PolicyStatement({
effect: iam.Effect.Allow,
actions: [
'iam:CreateRole',
'iam:DeleteRole',
'iam:AttachRolePolicy',
'iam:DetachRolePolicy',
'iam:PassRole',
],
resources: ['*'],
}));
}
}
Cost Optimization and Monitoring
Resource Right-Sizing
I’ve learned that preview environments need to balance cost with functionality. Here’s a practical cost breakdown for a 72-hour preview environment:
// Cost optimization configuration
const previewConfig = {
lambda: {
memorySize: 256, // MB - sufficient for most workloads
timeout: Duration.seconds(30),
reservedConcurrency: 5, // Limit concurrent executions
},
apiGateway: {
throttling: {
rateLimit: 100,
burstLimit: 200,
},
},
cloudfront: {
priceClass: cloudfront.PriceClass.PRICE_CLASS_100, // Use edge locations in US and Europe only
defaultTtl: Duration.hours(1), // Short TTL for development
},
};
// Estimated costs per 72-hour environment:
// - API Gateway: ~$0.12 (1000 requests × $3.50/million)
// - Lambda: ~$0.08 (50 invocations × 1GB-sec × $0.0000166667)
// - CloudFront: ~$0.04 (1GB data transfer)
// - Route53: ~$0.02 (hosted zone queries)
// Total: ~$0.26 per preview environment
Automated Cleanup
Cleanup automation prevents cost runaway and ensures environments don’t accumulate:
// lib/cleanup-function.ts
import { CloudFormationClient, DeleteStackCommand, ListStacksCommand } from '@aws-sdk/client-cloudformation';
export const handler = async (event: any) => {
const cfn = new CloudFormationClient({});
try {
// Find stacks with expired TTL tags
const { StackSummaries } = await cfn.send(new ListStacksCommand({
StackStatusFilter: ['CREATE_COMPLETE', 'UPDATE_COMPLETE'],
}));
const expiredStacks = StackSummaries?.filter(stack => {
if (!stack.StackName?.startsWith('preview-pr-')) return false;
const ttlTag = stack.Tags?.find(tag => tag.Key === 'TTL');
if (!ttlTag?.Value) return false;
const expiryDate = new Date(ttlTag.Value);
return expiryDate < new Date();
}) || [];
// Delete expired stacks
for (const stack of expiredStacks) {
console.log(`Deleting expired stack: ${stack.StackName}`);
await cfn.send(new DeleteStackCommand({
StackName: stack.StackName,
}));
}
return {
statusCode: 200,
body: JSON.stringify({
message: `Cleaned up ${expiredStacks.length} expired stacks`,
deletedStacks: expiredStacks.map(s => s.StackName),
}),
};
} catch (error) {
console.error('Cleanup failed:', error);
throw error;
}
};
// Schedule cleanup Lambda with EventBridge
import { Duration } from 'aws-cdk-lib';
import * as events from 'aws-cdk-lib/aws-events';
import * as targets from 'aws-cdk-lib/aws-events-targets';
import * as lambda from 'aws-cdk-lib/aws-lambda';
const cleanupRule = new events.Rule(this, 'CleanupRule', {
schedule: events.Schedule.rate(Duration.hours(6)),
description: 'Clean up expired preview environments',
});
cleanupRule.addTarget(new targets.LambdaFunction(cleanupFunction));
E2E Testing Integration
Cypress Configuration
Here’s how I’ve integrated E2E testing with preview environments:
// cypress.config.ts
import { defineConfig } from 'cypress';
export default defineConfig({
e2e: {
baseUrl: process.env.CYPRESS_BASE_URL || 'http://localhost:3000',
video: true,
screenshotOnRunFailure: true,
defaultCommandTimeout: 10000,
requestTimeout: 15000,
responseTimeout: 15000,
setupNodeEvents(on, config) {
// Take screenshots on failure
on('after:screenshot', (details) => {
console.log('Screenshot taken:', details.path);
});
// Custom commands for preview environment testing
on('task', {
waitForDeployment() {
// Custom logic to wait for deployment readiness
return null;
},
});
},
},
});
// cypress/e2e/preview-environment.cy.js
describe('Preview Environment Tests', () => {
beforeEach(() => {
// Ensure we're testing the right environment
cy.visit('/');
cy.get('[data-testid="environment-indicator"]')
.should('contain', 'Preview');
});
it('should load the application correctly', () => {
cy.get('[data-testid="app-header"]').should('be.visible');
cy.get('[data-testid="main-content"]').should('be.visible');
});
it('should handle API requests', () => {
cy.intercept('GET', '/api/health').as('healthCheck');
cy.visit('/dashboard');
cy.wait('@healthCheck').then((interception) => {
expect(interception.response?.statusCode).to.equal(200);
});
});
it('should display correct environment information', () => {
cy.visit('/debug');
cy.get('[data-testid="pr-number"]')
.should('contain', Cypress.env('PR_NUMBER'));
cy.get('[data-testid="commit-sha"]')
.should('contain', Cypress.env('COMMIT_SHA'));
});
});
Security Best Practices
Network Security
Working with preview environments exposed to the internet, I’ve learned these security patterns work well:
// VPC and security groups for Lambda
import * as ec2 from 'aws-cdk-lib/aws-ec2';
const vpc = new ec2.Vpc(this, 'PreviewVPC', {
maxAzs: 2,
natGateways: 1, // Cost optimization
subnetConfiguration: [
{
cidrMask: 24,
name: 'Private',
subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
},
{
cidrMask: 24,
name: 'Public',
subnetType: ec2.SubnetType.PUBLIC,
},
],
});
const securityGroup = new ec2.SecurityGroup(this, 'LambdaSecurityGroup', {
vpc,
description: 'Security group for preview Lambda functions',
allowAllOutbound: true,
});
// Restrict inbound traffic
securityGroup.addIngressRule(
ec2.Peer.anyIpv4(),
ec2.Port.tcp(443),
'HTTPS traffic only'
);
Secrets Management
// Systems Manager Parameter Store for secrets
import * as ssm from 'aws-cdk-lib/aws-ssm';
import * as lambda from 'aws-cdk-lib/aws-lambda';
const dbPassword = new ssm.StringParameter(this, 'DatabasePassword', {
parameterName: `/preview/${prNumber}/database/password`,
stringValue: 'generated-secure-password',
tier: ssm.ParameterTier.STANDARD,
});
// Lambda environment variables reference parameters
const appFunction = new lambda.Function(this, 'AppFunction', {
// ... other configuration
environment: {
DATABASE_PASSWORD_PARAM: dbPassword.parameterName,
},
});
// Grant Lambda permission to read the parameter
dbPassword.grantRead(appFunction);
Real-World Lessons Learned
Common Pitfalls I’ve Encountered
DNS Propagation Delays: Route53 changes can take 30-60 seconds to propagate. I learned to add health checks before marking deployments as ready:
// Health check implementation
const healthCheck = new route53.HealthCheck(this, 'HealthCheck', {
type: route53.HealthCheckType.HTTPS,
resourcePath: '/health',
fqdn: previewDomain,
requestInterval: Duration.seconds(30),
failureThreshold: 3,
});
Resource Cleanup Failures: Sometimes CDK destroy operations fail due to resource dependencies. Here’s a retry mechanism that works:
# Enhanced cleanup script
#!/bin/bash
STACK_NAME="preview-pr-$1"
MAX_RETRIES=3
for i in $(seq 1 $MAX_RETRIES); do
echo "Attempt $i to destroy stack $STACK_NAME"
if npx cdk destroy $STACK_NAME --force; then
echo "Stack destroyed successfully"
exit 0
fi
if [ $i -lt $MAX_RETRIES ]; then
echo "Retry in 30 seconds..."
sleep 30
fi
done
echo "Failed to destroy stack after $MAX_RETRIES attempts"
exit 1
Cold Start Performance: Lambda cold starts can make initial tests fail. Pre-warming helps:
// Lambda warmer function
const warmerFunction = new lambda.Function(this, 'Warmer', {
runtime: lambda.Runtime.NODEJS_18_X,
handler: 'warmer.handler',
code: lambda.Code.fromInline(`
const AWS = require('aws-sdk');
const lambda = new AWS.Lambda();
exports.handler = async () => {
await lambda.invoke({
FunctionName: process.env.TARGET_FUNCTION,
InvocationType: 'Event',
Payload: JSON.stringify({ warmer: true })
}).promise();
};
`),
environment: {
TARGET_FUNCTION: appFunction.functionName,
},
});
// Schedule pre-warming
const warmupRule = new events.Rule(this, 'WarmupRule', {
schedule: events.Schedule.rate(Duration.minutes(5)),
});
warmupRule.addTarget(new targets.LambdaFunction(warmerFunction));
Performance Optimizations
Parallel CDK Deployments: For teams with many concurrent PRs, deploy multiple stacks in parallel:
# Matrix strategy for parallel deployments
strategy:
matrix:
include:
- stack: frontend
directory: packages/frontend
- stack: backend
directory: packages/backend
- stack: infrastructure
directory: infrastructure
steps:
- name: Deploy ${{ matrix.stack }}
working-directory: ${{ matrix.directory }}
run: |
npx cdk deploy preview-pr-${{ env.PR_NUMBER }}-${{ matrix.stack }} \
--require-approval never
CloudFront Caching Strategy: Balance freshness with performance:
const distribution = new cloudfront.CloudFrontWebDistribution(this, 'Distribution', {
originConfigs: [{
customOriginSource: {
domainName: api.restApiId + '.execute-api.' + this.region + '.amazonaws.com',
originPath: '/prod',
},
behaviors: [
{
isDefaultBehavior: true,
allowedMethods: cloudfront.CloudFrontAllowedMethods.ALL,
cachedMethods: cloudfront.CloudFrontAllowedCachedMethods.GET_HEAD_OPTIONS,
cachePolicyId: cloudfront.OriginRequestPolicyId.CORS_S3_ORIGIN,
ttl: {
default: Duration.minutes(5), // Short TTL for development
max: Duration.hours(1),
min: Duration.seconds(0),
},
},
],
}],
});
Monitoring and Alerting
Cost Monitoring
Track spending per PR to prevent budget surprises:
// CloudWatch dashboard for preview environments
import { Construct } from 'constructs';
import * as cloudwatch from 'aws-cdk-lib/aws-cloudwatch';
const dashboard = new cloudwatch.Dashboard(this, 'PreviewDashboard', {
dashboardName: 'Preview-Environments',
widgets: [
[
new cloudwatch.GraphWidget({
title: 'Preview Environment Costs',
left: [
new cloudwatch.Metric({
namespace: 'AWS/Billing',
metricName: 'EstimatedCharges',
dimensionsMap: {
Currency: 'USD',
},
}),
],
}),
],
[
new cloudwatch.SingleValueWidget({
title: 'Active Preview Environments',
metrics: [
new cloudwatch.Metric({
namespace: 'Custom/Preview',
metricName: 'ActiveEnvironments',
}),
],
}),
],
],
});
// Cost alert
const costAlarm = new cloudwatch.Alarm(this, 'PreviewCostAlarm', {
metric: new cloudwatch.Metric({
namespace: 'AWS/Billing',
metricName: 'EstimatedCharges',
dimensionsMap: {
Currency: 'USD',
},
}),
threshold: 50, // Alert if monthly costs exceed $50
evaluationPeriods: 1,
});
Deployment Success Tracking
// Custom metrics for deployment tracking
const deploymentMetric = new cloudwatch.Metric({
namespace: 'Custom/Preview',
metricName: 'DeploymentSuccess',
dimensionsMap: {
Environment: 'preview',
PRNumber: prNumber,
},
});
// Send success metric after deployment
const successMetric = new cloudwatch.PutMetricDataCommand({
Namespace: 'Custom/Preview',
MetricData: [{
MetricName: 'DeploymentSuccess',
Value: 1,
Unit: 'Count',
Dimensions: [
{ Name: 'Environment', Value: 'preview' },
{ Name: 'PRNumber', Value: prNumber },
],
}],
});
Key Takeaways
After implementing this pattern across multiple projects, here’s what I’ve learned:
-
Start Simple: Begin with basic Lambda + API Gateway. Add complexity as your team grows comfortable with the automation.
-
Cost Control is Critical: Without proper tagging and cleanup, preview environments can quickly become expensive. The automated cleanup is non-negotiable.
-
Security from Day One: Use OIDC instead of long-lived credentials. It’s more secure and eliminates credential rotation headaches.
-
Monitor Everything: Failed deployments and runaway costs are much easier to catch with proper monitoring from the start.
-
Test the Cleanup: Your cleanup automation will eventually fail. Test it regularly and have manual fallbacks ready.
The investment in automation pays off quickly. Teams report faster review cycles, fewer staging environment conflicts, and more confidence in their deployments. Most importantly, it eliminates the friction that often slows down development workflows.
Working with this pattern, I’ve seen deployment-to-ready times consistently under 5 minutes, with costs staying below $0.30 per 72-hour environment. The developer experience improvement alone makes this architectural pattern worthwhile.
Related posts
A comprehensive technical guide to Amazon Cognito's advanced features including custom authentication flows, federation patterns, multi-tenancy architectures, migration strategies, and production-grade security implementation.
A comprehensive technical guide to choosing and implementing AWS edge computing solutions for global applications with practical examples and cost optimization strategies.
Learn how to build a comprehensive testing strategy for AWS Lambda, API Gateway, DynamoDB, and Step Functions with practical patterns for fast feedback and production reliability.
Practical approaches to managing Lambda Layer versions across dev, staging, and production environments with AWS CDK, including automated deployment pipelines and rollback strategies.
Setting up a production-grade link shortener with AWS CDK, DynamoDB, and Lambda. Real architecture decisions, initial setup, and lessons learned from building URL shorteners at scale.