2025-12-24
Amazon Cognito Deep Dive: Beyond Basic Authentication
A comprehensive technical guide to Amazon Cognito's advanced features including custom authentication flows, federation patterns, multi-tenancy architectures, migration strategies, and production-grade security implementation.
Abstract
Amazon Cognito provides managed authentication and authorization for applications, but production systems demand more than basic sign-up and sign-in flows. This guide explores advanced Cognito patterns that mid-to-senior developers need for building scalable, secure authentication systems: custom Lambda triggers for multi-factor workflows, Pre Token Generation for multi-tenant token customization, SAML/OIDC federation with enterprise identity providers, API Gateway integration with caching strategies, and zero-downtime migration from Auth0 or custom systems.
Working with Cognito across different projects taught me that the real challenges emerge when implementing tenant isolation, handling federation complexity, and navigating limitations like MFA lock-in and lack of cross-region replication. This guide provides battle-tested patterns with working CDK code, realistic performance metrics, and hard-learned lessons about what works at scale.
Understanding the Architecture
User Pools vs Identity Pools
The distinction between User Pools and Identity Pools confuses many developers initially. They serve fundamentally different purposes:
User Pools handle authentication - validating who users are. They manage user directories, credentials, MFA, password policies, and OAuth flows. When users sign in, they receive JWT tokens (ID token, access token, refresh token).
Identity Pools handle authorization - providing temporary AWS credentials to access services like S3, DynamoDB, or SQS directly from client applications. They exchange authentication tokens (from User Pools or external providers) for AWS credentials.
When to use each pattern:
- User Pool alone: Frontend calling API Gateway or backend services
- Identity Pool alone: Guest access to AWS resources (analytics, public data)
- Both together: Authenticated users accessing S3, DynamoDB directly from frontend
Production Setup with CDK
Here’s a complete setup demonstrating both User Pool and Identity Pool with proper security configuration:
import * as cognito from 'aws-cdk-lib/aws-cognito';
import * as iam from 'aws-cdk-lib/aws-iam';
// User Pool for Authentication
const userPool = new cognito.UserPool(this, 'UserPool', {
selfSignUpEnabled: false, // Production: Control user creation
signInAliases: { email: true, username: true },
autoVerify: { email: true },
passwordPolicy: {
minLength: 12,
requireLowercase: true,
requireUppercase: true,
requireDigits: true,
requireSymbols: true,
},
accountRecovery: cognito.AccountRecovery.EMAIL_ONLY,
advancedSecurityMode: cognito.AdvancedSecurityMode.ENFORCED,
mfa: cognito.Mfa.OPTIONAL,
mfaSecondFactor: {
sms: true,
otp: true, // Time-based one-time password (TOTP)
},
});
// App client for web application
const appClient = userPool.addClient('WebAppClient', {
authFlows: {
userPassword: false, // Disable less secure flow
userSrp: true, // Secure Remote Password
custom: true, // Enable custom auth flows
},
oAuth: {
flows: {
authorizationCodeGrant: true,
implicitCodeGrant: false, // Avoid implicit flow in production
},
scopes: [
cognito.OAuthScope.OPENID,
cognito.OAuthScope.EMAIL,
cognito.OAuthScope.PROFILE,
cognito.OAuthScope.custom('billing-api/read'),
],
callbackUrls: ['https://app.example.com/callback'],
logoutUrls: ['https://app.example.com/logout'],
},
generateSecret: true, // Required for server-side apps
});
// Identity Pool for AWS resource access
const identityPool = new cognito.CfnIdentityPool(this, 'IdentityPool', {
allowUnauthenticatedIdentities: false,
cognitoIdentityProviders: [{
clientId: appClient.userPoolClientId,
providerName: userPool.userPoolProviderName,
}],
});
// Authenticated role with scoped permissions
const authenticatedRole = new iam.Role(this, 'CognitoAuthenticatedRole', {
assumedBy: new iam.FederatedPrincipal(
'cognito-identity.amazonaws.com',
{
StringEquals: {
'cognito-identity.amazonaws.com:aud': identityPool.ref,
},
'ForAnyValue:StringLike': {
'cognito-identity.amazonaws.com:amr': 'authenticated',
},
},
'sts:AssumeRoleWithWebIdentity'
),
});
// Grant specific S3 access with user-scoped paths
authenticatedRole.addToPolicy(new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
actions: ['s3:GetObject', 's3:PutObject'],
resources: ['arn:aws:s3:::my-bucket/${cognito-identity.amazonaws.com:sub}/*'],
}));
Key configuration decisions:
selfSignUpEnabled: falseprevents unauthorized user creationadvancedSecurityMode: ENFORCEDenables compromised credential detectionmfa: OPTIONALallows flexibility (never use REQUIRED - it’s irreversible)generateSecret: truefor backend clients that can securely store secrets
Custom Authentication Flows
Custom authentication flows enable complex requirements like CAPTCHA verification, security questions, or passwordless authentication. Three Lambda triggers work together to orchestrate the challenge sequence.
How Custom Auth Works
Multi-Factor Challenge Implementation
This example implements a complete flow: password → CAPTCHA → security question.
// Define Auth Challenge - Orchestrates the challenge sequence
export const defineAuthChallenge = async (event: DefineAuthChallengeTrigger) => {
const session = event.request.session;
// First challenge: SRP password verification (handled by Cognito)
if (session.length === 0) {
event.response.issueTokens = false;
event.response.failAuthentication = false;
event.response.challengeName = 'SRP_A';
}
// Second challenge: SRP password verifier
else if (session.length === 1 && session[0].challengeName === 'SRP_A') {
event.response.issueTokens = false;
event.response.failAuthentication = false;
event.response.challengeName = 'PASSWORD_VERIFIER';
}
// Third challenge: CAPTCHA
else if (session.length === 2 && session[1].challengeName === 'PASSWORD_VERIFIER'
&& session[1].challengeResult === true) {
event.response.issueTokens = false;
event.response.failAuthentication = false;
event.response.challengeName = 'CUSTOM_CHALLENGE';
event.response.challengeMetadata = 'CAPTCHA_CHALLENGE';
}
// Fourth challenge: Security question
else if (session.length === 3 && session[2].challengeName === 'CUSTOM_CHALLENGE'
&& session[2].challengeResult === true) {
event.response.issueTokens = false;
event.response.failAuthentication = false;
event.response.challengeName = 'CUSTOM_CHALLENGE';
event.response.challengeMetadata = 'SECURITY_QUESTION';
}
// All challenges passed
else if (session.length === 4 && session[3].challengeName === 'CUSTOM_CHALLENGE'
&& session[3].challengeResult === true) {
event.response.issueTokens = true;
event.response.failAuthentication = false;
}
// Challenge failed
else {
event.response.issueTokens = false;
event.response.failAuthentication = true;
}
return event;
};
// Create Auth Challenge - Generates challenge data
export const createAuthChallenge = async (event: CreateAuthChallengeTrigger) => {
const metadata = event.request.challengeMetadata;
if (metadata === 'CAPTCHA_CHALLENGE') {
// Generate CAPTCHA using external service or internal logic
const captchaToken = await generateCaptcha();
event.response.publicChallengeParameters = {
captchaUrl: `https://captcha.example.com/${captchaToken}`,
challengeType: 'CAPTCHA',
};
event.response.privateChallengeParameters = {
captchaAnswer: await getCaptchaAnswer(captchaToken),
};
}
else if (metadata === 'SECURITY_QUESTION') {
// Fetch user's security question from DynamoDB
const question = await getSecurityQuestion(event.userName);
event.response.publicChallengeParameters = {
question: question.text,
challengeType: 'SECURITY_QUESTION',
};
event.response.privateChallengeParameters = {
answer: question.answer,
};
}
return event;
};
// Verify Auth Challenge Response
export const verifyAuthChallenge = async (event: VerifyAuthChallengeTrigger) => {
const privateParams = event.request.privateChallengeParameters;
const challengeAnswer = event.request.challengeAnswer;
if (privateParams.captchaAnswer) {
event.response.answerCorrect =
challengeAnswer.toLowerCase() === privateParams.captchaAnswer.toLowerCase();
}
else if (privateParams.answer) {
event.response.answerCorrect =
challengeAnswer.toLowerCase() === privateParams.answer.toLowerCase();
}
return event;
};
Critical implementation details:
- Challenge sequence must be deterministic based on
sessionarray - Use
challengeMetadatato differentiate between custom challenges privateChallengeParametersnever sent to client, used only for verification- Each trigger has 5-second timeout limit - keep logic fast
Token Customization for Multi-Tenancy
Pre Token Generation Lambda allows adding custom claims to JWT tokens, essential for multi-tenant SaaS applications where tenant context must travel with every request.
Pre Token Generation V2
// Pre Token Generation V2 - Customize both ID and Access tokens
export const preTokenGeneration = async (event: PreTokenGenerationTriggerEvent) => {
// Fetch tenant and role information from DynamoDB
const userMetadata = await getUserMetadata(event.userName);
if (event.request.userAttributes['custom:tenantId']) {
const tenantId = event.request.userAttributes['custom:tenantId'];
// Verify tenant is active
const tenant = await getTenantById(tenantId);
if (!tenant || tenant.status !== 'ACTIVE') {
throw new Error('Tenant is not active');
}
// Add custom claims to ID token (for user info)
event.response.claimsOverrideDetails = {
claimsToAddOrOverride: {
'custom:tenantId': tenantId,
'custom:tenantName': tenant.name,
'custom:organizationId': tenant.organizationId,
'custom:role': userMetadata.role,
'custom:permissions': JSON.stringify(userMetadata.permissions),
},
};
// Customize Access Token (Cognito Essentials/Plus tier only)
if (event.triggerSource === 'TokenGeneration_Authentication') {
event.response.claimsOverrideDetails.accessTokenGeneration = {
claimsToAddOrOverride: {
'tenant_id': tenantId,
'role': userMetadata.role,
},
claimsToSuppress: [],
scopesToAdd: [`tenant:${tenantId}:read`, `tenant:${tenantId}:write`],
};
}
}
// Add subscription tier for feature flags
if (userMetadata.subscriptionTier) {
event.response.claimsOverrideDetails.claimsToAddOrOverride['custom:tier'] =
userMetadata.subscriptionTier;
}
return event;
};
// DynamoDB helper functions
async function getUserMetadata(username: string) {
const result = await dynamoDB.get({
TableName: 'UserMetadata',
Key: { username },
}).promise();
return result.Item || { role: 'user', permissions: [] };
}
async function getTenantById(tenantId: string) {
const result = await dynamoDB.get({
TableName: 'Tenants',
Key: { tenantId },
}).promise();
return result.Item;
}
Security considerations:
- Never include sensitive data (passwords, API keys) in tokens
- Keep token size under 8KB to avoid HTTP header limits
- Use opaque references for large permission sets
- Validate tenant context to prevent token forgery
Warning: Token Size Pitfall: Adding too many custom claims can push tokens over 8KB, causing HTTP 431 errors. Monitor token size in production and use reference IDs instead of embedding large data structures.
Multi-Tenancy Patterns
Choosing the right multi-tenancy pattern significantly impacts scalability, isolation, and operational complexity.
Shared Pool with Custom Attributes
This pattern works well for most SaaS applications with fewer than 100 tenants:
// Shared User Pool with tenant isolation
const userPool = new cognito.UserPool(this, 'MultiTenantUserPool', {
selfSignUpEnabled: false,
standardAttributes: {
email: { required: true, mutable: true },
},
customAttributes: {
tenantId: new cognito.StringAttribute({
minLen: 1,
maxLen: 128,
mutable: false, // Cannot change tenant after creation
}),
organizationId: new cognito.StringAttribute({
minLen: 1,
maxLen: 128,
mutable: false,
}),
role: new cognito.StringAttribute({
minLen: 1,
maxLen: 64,
mutable: true, // Role can be updated
}),
},
});
// Pre Sign Up - Assign tenant from invitation token
export const preSignUp = async (event: PreSignUpTriggerEvent) => {
const invitationToken = event.request.validationData?.invitationToken;
if (!invitationToken) {
throw new Error('Invitation token required');
}
// Validate invitation and get tenant info
const invitation = await validateInvitation(invitationToken);
if (!invitation || invitation.expired) {
throw new Error('Invalid or expired invitation');
}
// Auto-confirm and set tenant attributes
event.response.autoConfirmUser = true;
event.response.autoVerifyEmail = true;
// These will be set as custom attributes
event.request.userAttributes['custom:tenantId'] = invitation.tenantId;
event.request.userAttributes['custom:organizationId'] = invitation.organizationId;
event.request.userAttributes['custom:role'] = invitation.role;
// Mark invitation as used
await markInvitationUsed(invitationToken, event.userName);
return event;
};
Pattern selection reality: Most applications start with shared pool + custom attributes, migrating to groups-based isolation only when tenant count exceeds 100 or security requirements demand stronger isolation.
SAML Federation with Enterprise Identity Providers
Federation allows users to authenticate through corporate identity providers like Azure AD, Okta, or OneLogin, essential for B2B SaaS applications.
Azure AD SAML Configuration
// CDK setup for SAML provider
const samlProvider = new cognito.UserPoolIdentityProviderSaml(this, 'AzureADProvider', {
userPool,
name: 'AzureAD',
metadata: cognito.UserPoolIdentityProviderSamlMetadata.url(
'https://login.microsoftonline.com/TENANT_ID/federationmetadata/2007-06/federationmetadata.xml'
),
attributeMapping: {
email: cognito.ProviderAttribute.other('http://schemas.xmlsoap.org/ws/2005/05/identity/claims/emailaddress'),
givenName: cognito.ProviderAttribute.other('http://schemas.xmlsoap.org/ws/2005/05/identity/claims/givenname'),
familyName: cognito.ProviderAttribute.other('http://schemas.xmlsoap.org/ws/2005/05/identity/claims/surname'),
custom: {
'tenantId': cognito.ProviderAttribute.other('http://schemas.microsoft.com/identity/claims/tenantid'),
},
},
idpSignout: true,
});
// Link federated user to existing profile (avoiding duplicates)
export const postAuthentication = async (event: PostAuthenticationTriggerEvent) => {
// Check if this is a federated identity
if (event.request.userAttributes.identities) {
const identities = JSON.parse(event.request.userAttributes.identities);
const federatedIdentity = identities[0];
if (federatedIdentity.providerName === 'AzureAD') {
const email = event.request.userAttributes.email;
// Check if user already exists with this email
const existingUser = await findUserByEmail(email);
if (existingUser && existingUser.username !== event.userName) {
// Link the federated identity to existing user
await cognito.adminLinkProviderForUser({
UserPoolId: event.userPoolId,
DestinationUser: {
ProviderName: 'Cognito',
ProviderAttributeValue: existingUser.username,
},
SourceUser: {
ProviderName: federatedIdentity.providerName,
ProviderAttributeName: 'Cognito_Subject',
ProviderAttributeValue: federatedIdentity.userId,
},
}).promise();
// Log the linking for audit
await auditLog({
action: 'FEDERATED_IDENTITY_LINKED',
email,
provider: federatedIdentity.providerName,
});
}
}
}
return event;
};
Federation best practices:
- Use metadata URL for automatic certificate rotation
- Map NameId to immutable attribute (user_id, not email)
- Implement account linking to prevent duplicate users
- Test both SP-initiated and IdP-initiated logout flows
Tip: Federation Testing: Test logout flows thoroughly. Federated logout requires coordination between Cognito, IdP, and application. Users appearing logged out in the app but still authenticated at IdP level is a common issue.
API Gateway Integration
API Gateway Cognito authorizers validate JWT tokens and cache authorization decisions for performance.
Complete Integration Setup
// CDK: API Gateway with Cognito authorizer
const api = new apigateway.RestApi(this, 'MyApi', {
restApiName: 'Secure API',
deployOptions: {
stageName: 'prod',
tracingEnabled: true,
},
});
const authorizer = new apigateway.CognitoUserPoolsAuthorizer(this, 'CognitoAuthorizer', {
cognitoUserPools: [userPool],
authorizerName: 'CognitoAuthorizer',
identitySource: 'method.request.header.Authorization',
resultsCacheTtl: Duration.minutes(5), // Cache authorization decisions
});
// Protected endpoint requiring specific OAuth scope
const protectedResource = api.root.addResource('billing');
protectedResource.addMethod('GET', new apigateway.LambdaIntegration(billingFunction), {
authorizer,
authorizationType: apigateway.AuthorizationType.COGNITO,
authorizationScopes: ['billing-api/read'], // OAuth scope validation
requestValidator: new apigateway.RequestValidator(this, 'RequestValidator', {
restApi: api,
validateRequestBody: true,
validateRequestParameters: true,
}),
});
// Lambda function with JWT validation and tenant isolation
export const handler = async (event: APIGatewayProxyEvent) => {
// API Gateway already validated JWT, extract claims
const claims = event.requestContext.authorizer?.claims;
if (!claims) {
return { statusCode: 401, body: 'Unauthorized' };
}
const tenantId = claims['custom:tenantId'];
const role = claims['custom:role'];
// Verify tenant context
if (!tenantId) {
return { statusCode: 403, body: 'Missing tenant context' };
}
// Query with tenant isolation
const result = await dynamoDB.query({
TableName: 'BillingRecords',
IndexName: 'TenantIndex',
KeyConditionExpression: 'tenantId = :tenantId',
ExpressionAttributeValues: {
':tenantId': tenantId,
},
}).promise();
// Apply role-based filtering
const filteredRecords = filterByRole(result.Items, role);
return {
statusCode: 200,
body: JSON.stringify(filteredRecords),
};
};
Authorization caching trade-offs:
| Cache TTL | Performance | Security | Use Case |
|---|---|---|---|
| None | Highest latency | Real-time permissions | High-security operations |
| 5 min | Good balance | ~5 min lag | Standard API endpoints |
| 30-60 min | Best performance | Stale permissions | Read-only public data |
Working with API Gateway authorization taught me that cached decisions persist for the full TTL even if permissions change. For critical permission changes, use shorter TTL or implement cache-busting strategies.
Migration from External Auth Providers
User Migration Lambda enables zero-downtime migration from Auth0, Okta, or custom authentication systems using lazy migration.
Lazy Migration Strategy
// User Migration Lambda - Lazy migration approach
export const userMigration = async (event: UserMigrationTriggerEvent) => {
if (event.triggerSource === 'UserMigration_Authentication') {
// User tries to sign in but doesn't exist in Cognito
const { userName, password } = event.request;
try {
// Validate credentials against Auth0
const auth0User = await validateWithAuth0(userName, password);
if (auth0User) {
// User is valid, migrate to Cognito
event.response.userAttributes = {
email: auth0User.email,
email_verified: 'true',
given_name: auth0User.given_name,
family_name: auth0User.family_name,
'custom:auth0Id': auth0User.user_id,
'custom:migratedAt': new Date().toISOString(),
};
event.response.finalUserStatus = 'CONFIRMED';
event.response.messageAction = 'SUPPRESS'; // Don't send welcome email
// Log migration for tracking
await logMigration(userName, 'success');
return event;
}
} catch (error) {
await logMigration(userName, 'failed', error);
throw error;
}
}
if (event.triggerSource === 'UserMigration_ForgotPassword') {
// User requests password reset but doesn't exist in Cognito
const { userName } = event.request;
// Check if user exists in Auth0
const auth0User = await getUserFromAuth0(userName);
if (auth0User) {
event.response.userAttributes = {
email: auth0User.email,
email_verified: 'true',
'custom:auth0Id': auth0User.user_id,
};
event.response.messageAction = 'SUPPRESS';
return event;
}
}
throw new Error('User not found in legacy system');
};
async function validateWithAuth0(username: string, password: string) {
const response = await axios.post('https://YOUR_DOMAIN.auth0.com/oauth/token', {
grant_type: 'password',
username,
password,
client_id: process.env.AUTH0_CLIENT_ID,
client_secret: process.env.AUTH0_CLIENT_SECRET,
audience: process.env.AUTH0_AUDIENCE,
scope: 'openid profile email',
});
if (response.data.access_token) {
// Get user info
const userInfo = await axios.get('https://YOUR_DOMAIN.auth0.com/userinfo', {
headers: { Authorization: `Bearer ${response.data.access_token}` },
});
return userInfo.data;
}
return null;
}
Migration timeline:
- Weeks 1-2: Implement User Migration Lambda, test with staging users
- Weeks 3-6: Enable lazy migration, monitor active user migration rate
- Weeks 7-8: Bulk import remaining inactive users via CSV or API
- Week 9+: Decommission legacy system after confirming all users migrated
This approach migrated users gradually over 60 days in one project, with 80% migrating through lazy authentication and 20% through bulk import.
Advanced Security Features
Cognito’s advanced security requires Plus tier pricing but provides enterprise-grade protection.
Security Configuration
// Enable Advanced Security (Plus tier required)
const userPool = new cognito.UserPool(this, 'SecureUserPool', {
advancedSecurityMode: cognito.AdvancedSecurityMode.ENFORCED,
userPoolAddOns: {
advancedSecurityMode: cognito.AdvancedSecurityMode.ENFORCED,
},
signInAliases: { email: true },
signInCaseSensitive: false,
});
// Post Authentication - Handle risk levels
export const postAuthentication = async (event: PostAuthenticationTriggerEvent) => {
const riskLevel = event.request.userContextData?.encodedData
? parseRiskData(event.request.userContextData.encodedData)
: 'LOW';
// Log authentication with risk level
await logAuthentication({
username: event.userName,
riskLevel,
ipAddress: event.request.userContextData?.ipAddress,
deviceKey: event.request.userContextData?.deviceKey,
timestamp: new Date().toISOString(),
});
// For high-risk authentications, trigger additional security
if (riskLevel === 'HIGH' || riskLevel === 'MEDIUM') {
await sendSecurityAlert(event.userName, riskLevel);
if (riskLevel === 'HIGH') {
await setUserMFARequired(event.userPoolId, event.userName);
}
}
return event;
};
Three security layers:
- Compromised Credentials Protection: AWS monitors breached credential databases and blocks sign-ins with known compromised passwords
- Adaptive Authentication: Risk scores based on IP, device, location with automatic responses per risk level
- MFA Options: SMS (highest friction), TOTP (balanced), WebAuthn/FIDO2 (lowest friction)
Warning: MFA Configuration Lock-in: Once MFA is set to “REQUIRED” (for any method: SMS, TOTP, or WebAuthn), you cannot disable or change it to “OPTIONAL” without recreating the pool. Always use “OPTIONAL” and enforce MFA selectively via application logic or adaptive authentication.
SDK Comparison: Amplify vs AWS SDK
Choosing the right client library impacts bundle size, features, and maintenance burden.
| Criteria | AWS Amplify | amazon-cognito-identity-js | AWS SDK v3 |
|---|---|---|---|
| Bundle Size | ~500KB (tree-shakeable) | ~100KB | ~50KB (modular) |
| Use Case | Frontend apps (React, React Native) | Frontend with custom UI | Backend/server-side |
| Secret Support | No | No | Yes |
| SRP Auth | Yes, Built-in | Yes, Built-in | No, Manual implementation |
| Token Management | Yes, Automatic | Yes, Manual | No, Manual |
| OAuth Flows | Yes, Full support | Limited | Yes, Full support |
| SSR Support | Limited (Next.js/Nuxt) | No | Yes |
| Maintenance | Yes, Active | Limited, Deprecating | Yes, Active |
Amplify Frontend Implementation
import { Amplify } from 'aws-amplify';
import { signIn, signOut, getCurrentUser } from 'aws-amplify/auth';
Amplify.configure({
Auth: {
Cognito: {
userPoolId: 'us-east-1_ABC123',
userPoolClientId: 'abc123def456',
identityPoolId: 'us-east-1:abc123-def456',
loginWith: {
oauth: {
domain: 'auth.example.com',
scopes: ['openid', 'email', 'profile', 'billing-api/read'],
redirectSignIn: ['https://app.example.com/callback'],
redirectSignOut: ['https://app.example.com/logout'],
responseType: 'code',
},
},
},
},
});
async function handleSignIn(email: string, password: string) {
try {
const { isSignedIn, nextStep } = await signIn({
username: email,
password,
});
if (nextStep.signInStep === 'CONFIRM_SIGN_IN_WITH_TOTP_CODE') {
const code = await promptForMFACode();
await confirmSignIn({ challengeResponse: code });
}
// Tokens are automatically stored and refreshed
const user = await getCurrentUser();
return user;
} catch (error) {
console.error('Sign in error:', error);
throw error;
}
}
AWS SDK Backend Implementation
import {
CognitoIdentityProviderClient,
AdminInitiateAuthCommand,
} from '@aws-sdk/client-cognito-identity-provider';
import { createHmac } from 'crypto';
const client = new CognitoIdentityProviderClient({ region: 'us-east-1' });
function calculateSecretHash(username: string): string {
const message = username + process.env.COGNITO_CLIENT_ID;
const hash = createHmac('sha256', process.env.COGNITO_CLIENT_SECRET!)
.update(message)
.digest('base64');
return hash;
}
async function authenticateUser(username: string, password: string) {
const command = new AdminInitiateAuthCommand({
UserPoolId: process.env.USER_POOL_ID,
ClientId: process.env.COGNITO_CLIENT_ID,
AuthFlow: 'ADMIN_USER_PASSWORD_AUTH',
AuthParameters: {
USERNAME: username,
PASSWORD: password,
SECRET_HASH: calculateSecretHash(username),
},
});
const response = await client.send(command);
return {
accessToken: response.AuthenticationResult?.AccessToken,
idToken: response.AuthenticationResult?.IdToken,
refreshToken: response.AuthenticationResult?.RefreshToken,
expiresIn: response.AuthenticationResult?.ExpiresIn,
};
}
Selection guideline: Use Amplify for React/React Native frontend applications with automatic token management. Use AWS SDK for backend services requiring client secrets and custom authentication flows.
Production Patterns and Monitoring
Token Refresh Strategy
const TOKEN_REFRESH_THRESHOLD = 5 * 60 * 1000; // 5 minutes
async function getValidToken(): Promise<string> {
const session = await Auth.currentSession();
const expiresAt = session.getAccessToken().getExpiration() * 1000;
if (Date.now() + TOKEN_REFRESH_THRESHOLD > expiresAt) {
const newSession = await Auth.currentSession();
return newSession.getAccessToken().getJwtToken();
}
return session.getAccessToken().getJwtToken();
}
Essential CloudWatch Metrics
Authentication metrics to track:
SignInSuccessesandSignInThrottles- Monitor authentication healthTokenRefreshSuccesses- Track token refresh failures- Custom metrics: Time to authenticate, MFA completion rate
- Alarms: High failure rate, throttling, advanced security blocks
Security metrics:
- Compromised credential detections
- High-risk authentication attempts
- Adaptive authentication triggers
- Account takeover prevention rate
Common Pitfalls and Solutions
Pitfall 1: No Backup Strategy
Problem: Cognito User Pools cannot be backed up or replicated across regions. Accidental deletion or region failure results in total user data loss.
Solution:
- Export user data daily to S3 using
ListUsersAPI - Store critical user metadata in DynamoDB
- Implement scheduled Lambda for automated exports
- Document pool recreation procedure
This is Cognito’s biggest limitation. Building backup processes from day one prevents severe data loss.
Pitfall 2: Token Size Limits
Problem: Adding too many custom claims causes tokens to exceed 8KB header limits, resulting in HTTP 431 errors.
Solution:
- Store large datasets in DynamoDB, add reference ID in token
- Use opaque IDs instead of embedding full objects
- Monitor token size in production
- Implement pagination for large permission sets
Example: Instead of embedding all permissions, add permissionSetId: "ps-123" and fetch details from cache.
Pitfall 3: Authorizer Cache Invalidation
Problem: API Gateway caches authorization decisions. Revoked permissions continue working until cache expires.
Solution:
- Use shorter TTL (5-15 minutes) for sensitive operations
- Implement cache busting by including version in authorization header
- Use Lambda authorizer for real-time permission checks
- Document cache behavior for security team
Working with various caching strategies showed that 5-minute TTL provides good balance between performance and security for most applications.
Pitfall 4: SMS Region Limitations
Problem: SMS sending via AWS End User Messaging SMS (formerly SNS) isn’t supported in all Cognito regions, causing unexpected verification failures.
Solution:
- Check AWS End User Messaging SMS support for your Cognito region
- Configure SMS spending limit in correct region
- Test SMS delivery in production region before launch
- Implement fallback to email verification
Pitfall 5: Lambda Trigger Timeouts
Problem: Lambda triggers have 5-second timeout for sync triggers, causing authentication failures with slow external APIs.
Solution:
- Keep trigger logic under 3 seconds
- Use async operations for non-critical tasks
- Cache external API responses
- Implement circuit breaker for external dependencies
- Monitor Lambda duration and errors
Pattern: Do critical validation in sync triggers, push analytics and logging to async processes.
Cost Analysis
Pricing Tiers (December 2024)
Lite tier (10,000 MAUs free, then tiered pricing):
- Basic authentication, MFA, social providers
- No advanced security
- Tiered pricing after free tier: 0.00375/MAU (50K-100K), etc.
Essentials tier ($0.015/MAU):
- Advanced security (audit mode)
- Access token customization
Plus tier ($0.02/MAU):
- Advanced security (enforced mode)
- SAML/OIDC federation
- 1.33x cost vs Essentials
Hidden costs:
- SMS MFA: $0.00645/message in US (via AWS End User Messaging SMS, formerly SNS)
- Lambda trigger invocations: $0.20 per 1M requests
- API Gateway authorizer calls (if caching disabled)
Cost optimization:
- Archive inactive users automatically
- Use federation to reduce direct user count
- Monitor MAU growth trends
- Consider Lambda authorizer for lower-traffic APIs
When to Choose Cognito vs Alternatives
Cognito Makes Sense For:
- AWS-native architecture
- Standard authentication requirements
- Budget-conscious projects
- Rapid MVP development
- Small to medium scale (< 10M users)
Consider Alternatives For:
Auth0: Complex authentication flows, extensive customization, enterprise SLA requirements, global compliance needs
Okta: Workforce identity (employees), enterprise SSO, advanced lifecycle management
Custom Solution: Unique authentication requirements, full data control, existing identity infrastructure, very high scale (> 100M users)
Cognito Limitations to Accept:
- No cross-region replication
- Limited user management APIs
- 3KB CSS customization limit
- No direct database access
- MFA configuration lock-in
Key Takeaways
-
Understand the Architecture: User Pools authenticate, Identity Pools authorize AWS access - they work together but serve different purposes
-
Start Simple, Scale Complexity: Begin with basic authentication, add Lambda triggers and federation as business requirements emerge
-
Plan Multi-Tenancy Early: Changing tenant isolation patterns after launch is painful. Shared pool with custom attributes works well for most SaaS applications
-
Custom Claims Enable Fine-Grained Authorization: Pre Token Generation V2 adds tenant context and permissions to tokens without extra API calls
-
Federation Is Complex: SAML/OIDC integration takes longer than expected. Budget time for testing logout flows and attribute mapping
-
Backup Your Users: Cognito doesn’t provide backup. Implement daily user export to S3 from day one
-
Cache Wisely: Authorizer caching improves performance but delays permission changes. Balance based on security requirements
-
Migration Takes Time: User migration is gradual. Plan for 30-60 day lazy migration plus bulk import for inactive users
-
Security Costs Money: Advanced security requires Plus tier ($0.02/MAU). Evaluate risk vs cost trade-off for your application
-
Know the Limits: Cognito has sharp edges (MFA lock-in, no replication, limited UI customization). Work within constraints or choose alternatives
Working with Cognito across different projects taught me that success comes from understanding these constraints early and building patterns that work within them. The service handles authentication well when you accept its architectural boundaries and plan accordingly.
Related posts
Learn to build automated preview environments using AWS CDK, Lambda, and GitHub Actions for seamless PR testing and review workflows
A comprehensive technical guide to choosing and implementing AWS edge computing solutions for global applications with practical examples and cost optimization strategies.
A comprehensive technical guide comparing AWS Secrets Manager and Systems Manager Parameter Store, demonstrating when to use each service with real-world implementation patterns.
Learn how to implement secure cross-account event distribution using Amazon SNS and SQS. Covers IAM policies, KMS encryption, AWS CDK implementation, and common pitfalls from real-world deployments.
Practical approaches to managing Lambda Layer versions across dev, staging, and production environments with AWS CDK, including automated deployment pipelines and rollback strategies.