2025-09-08
Real-time Notifications and Multi-Channel Delivery: WebSockets, Push, Email, and Beyond
Implementation strategies for real-time notification delivery across WebSocket, push notification, email, SMS, and webhook channels with production-tested patterns
Real-time notifications sound simple until you deal with platform-specific push notification differences, WebSocket connection management at scale, and vendor costs that multiply with traffic.
Multi-channel notification systems reveal that the challenge isn’t sending notifications - it’s doing it reliably, at scale, across different delivery mechanisms that each have their own quirks, limitations, and failure modes.
Here are patterns for WebSocket connections, push notifications, email delivery, SMS, and webhooks that work in production environments.
WebSocket Management: The Foundation of Real-Time
WebSockets seem straightforward until you hit production with thousands of concurrent connections. Here’s what I’ve learned about building WebSocket infrastructure that actually scales:
Connection Management That Works
The key insight: connections are ephemeral, but user state isn’t. Here’s a connection manager designed for production reliability:
interface ConnectionMetadata {
userId: string;
deviceId?: string;
userAgent: string;
connectedAt: Date;
lastPing: Date;
subscriptions: Set<string>;
metadata: Record<string, any>;
}
class WebSocketConnectionManager {
private connections: Map<string, {
socket: WebSocket;
metadata: ConnectionMetadata;
}> = new Map();
private userConnections: Map<string, Set<string>> = new Map();
private redis: Redis;
private heartbeatInterval: NodeJS.Timeout;
constructor(redis: Redis) {
this.redis = redis;
this.startHeartbeat();
}
async handleConnection(socket: WebSocket, request: IncomingMessage): Promise<void> {
const connectionId = this.generateConnectionId();
try {
// Extract user info from JWT token or session
const userInfo = await this.authenticateConnection(request);
if (!userInfo) {
socket.close(1008, 'Authentication required');
return;
}
const metadata: ConnectionMetadata = {
userId: userInfo.userId,
deviceId: userInfo.deviceId,
userAgent: request.headers['user-agent'] || '',
connectedAt: new Date(),
lastPing: new Date(),
subscriptions: new Set(),
metadata: {}
};
// Store connection
this.connections.set(connectionId, { socket, metadata });
// Update user connection mapping
if (!this.userConnections.has(userInfo.userId)) {
this.userConnections.set(userInfo.userId, new Set());
}
this.userConnections.get(userInfo.userId)!.add(connectionId);
// Store connection info in Redis for multi-instance support
await this.redis.hset(
`ws:connections:${userInfo.userId}`,
connectionId,
JSON.stringify({
serverId: process.env.SERVER_ID,
connectedAt: metadata.connectedAt,
deviceId: metadata.deviceId
})
);
// Setup event handlers
this.setupConnectionHandlers(connectionId, socket, metadata);
// Send connection acknowledgment
await this.sendMessage(connectionId, {
type: 'connection_ack',
data: { connectionId, timestamp: new Date() }
});
console.log(`WebSocket connection established for user ${userInfo.userId}`);
} catch (error) {
console.error('WebSocket connection setup failed:', error);
socket.close(1011, 'Internal server error');
}
}
private setupConnectionHandlers(
connectionId: string,
socket: WebSocket,
metadata: ConnectionMetadata
): void {
socket.on('message', async (data) => {
try {
const message = JSON.parse(data.toString());
await this.handleMessage(connectionId, message);
} catch (error) {
console.error('Message handling error:', error);
await this.sendError(connectionId, 'Invalid message format');
}
});
socket.on('pong', () => {
metadata.lastPing = new Date();
});
socket.on('close', async (code, reason) => {
await this.handleDisconnection(connectionId, code, reason);
});
socket.on('error', async (error) => {
console.error(`WebSocket error for ${connectionId}:`, error);
await this.handleDisconnection(connectionId, 1011, 'Connection error');
});
}
async sendNotificationToUser(userId: string, notification: NotificationEvent): Promise<void> {
const userConnectionIds = this.userConnections.get(userId) || new Set();
if (userConnectionIds.size === 0) {
// User not connected to this server instance
// Check Redis for other server instances
const remoteConnections = await this.redis.hgetall(`ws:connections:${userId}`);
if (Object.keys(remoteConnections).length > 0) {
// User connected to another server instance
await this.redis.publish('ws:notification', JSON.stringify({
userId,
notification,
targetServerId: null // broadcast to all servers
}));
}
return;
}
// Send to local connections
const sendPromises = Array.from(userConnectionIds).map(async (connectionId) => {
try {
await this.sendMessage(connectionId, {
type: 'notification',
data: notification
});
return { connectionId, success: true };
} catch (error) {
console.error(`Failed to send notification to ${connectionId}:`, error);
return { connectionId, success: false, error };
}
});
const results = await Promise.allSettled(sendPromises);
// Clean up failed connections
results.forEach((result, index) => {
if (result.status === 'rejected' ||
(result.status === 'fulfilled' && !result.value.success)) {
const connectionId = Array.from(userConnectionIds)[index];
this.handleDisconnection(connectionId, 1011, 'Send failed');
}
});
}
}
WebSocket Scaling Patterns
WebSockets don’t scale the same way REST APIs do. Here’s a multi-instance coordination pattern that works across different deployment scenarios:
class WebSocketCluster {
constructor(
private connectionManager: WebSocketConnectionManager,
private redis: Redis
) {
this.setupClusterCommunication();
}
private setupClusterCommunication(): void {
// Listen for notifications that need to be delivered
this.redis.subscribe('ws:notification');
this.redis.subscribe('ws:broadcast');
this.redis.on('message', async (channel, message) => {
try {
const data = JSON.parse(message);
if (channel === 'ws:notification') {
await this.handleRemoteNotification(data);
} else if (channel === 'ws:broadcast') {
await this.handleBroadcast(data);
}
} catch (error) {
console.error('Cluster message handling error:', error);
}
});
}
private async handleRemoteNotification(data: {
userId: string;
notification: NotificationEvent;
targetServerId?: string;
}): Promise<void> {
// Only process if no target server specified or we're the target
if (data.targetServerId && data.targetServerId !== process.env.SERVER_ID) {
return;
}
await this.connectionManager.sendNotificationToUser(
data.userId,
data.notification
);
}
async broadcastSystemMessage(message: any): Promise<void> {
await this.redis.publish('ws:broadcast', JSON.stringify({
message,
senderId: process.env.SERVER_ID,
timestamp: new Date()
}));
}
}
Push Notifications: Mobile’s Double-Edged Sword
Push notifications look simple in documentation but become complex fast when you need to handle multiple platforms, user permissions, and delivery guarantees. Here’s what production taught me:
Multi-Platform Push Service
iOS and Android should be treated as completely different systems, even though they’re both “push notifications”:
interface PushProvider {
sendNotification(
tokens: string[],
payload: PushPayload,
options?: PushOptions
): Promise<PushResult[]>;
validateToken(token: string): Promise<boolean>;
getInvalidTokens(results: PushResult[]): string[];
}
interface PushPayload {
title: string;
body: string;
data?: Record<string, any>;
badge?: number;
sound?: string;
icon?: string;
image?: string;
}
class UnifiedPushService {
private providers: Map<PushPlatform, PushProvider> = new Map();
private tokenStore: TokenStore;
private analytics: PushAnalytics;
constructor() {
this.providers.set('ios', new APNSProvider());
this.providers.set('android', new FCMProvider());
this.providers.set('web', new WebPushProvider());
}
async sendPushNotification(
userId: string,
notification: NotificationEvent
): Promise<PushDeliveryResult> {
try {
// Get all push tokens for user
const userTokens = await this.tokenStore.getUserTokens(userId);
if (userTokens.length === 0) {
return {
success: false,
reason: 'no_tokens',
deliveries: []
};
}
// Group tokens by platform
const tokensByPlatform = this.groupTokensByPlatform(userTokens);
// Prepare platform-specific payloads
const payloads = await this.createPlatformPayloads(notification);
// Send to each platform
const deliveryPromises = Object.entries(tokensByPlatform).map(
([platform, tokens]) => this.sendToPlatform(
platform as PushPlatform,
tokens,
payloads[platform as PushPlatform],
notification
)
);
const results = await Promise.allSettled(deliveryPromises);
// Process results and clean up invalid tokens
const deliveries = await this.processDeliveryResults(results, userTokens);
// Track analytics
await this.analytics.trackPushDelivery(notification.id, deliveries);
return {
success: deliveries.some(d => d.success),
deliveries
};
} catch (error) {
console.error('Push notification delivery failed:', error);
return {
success: false,
reason: 'send_error',
error: error.message,
deliveries: []
};
}
}
private async sendToPlatform(
platform: PushPlatform,
tokens: PushToken[],
payload: PushPayload,
notification: NotificationEvent
): Promise<PlatformDeliveryResult> {
const provider = this.providers.get(platform);
if (!provider) {
throw new Error(`No provider for platform ${platform}`);
}
// Platform-specific options
const options: PushOptions = {
priority: this.mapPriorityToPlatform(notification.priority, platform),
ttl: notification.expiresAt ?
Math.floor((notification.expiresAt.getTime() - Date.now()) / 1000) :
3600, // 1 hour default
collapseKey: platform === 'android' ? notification.type : undefined,
apnsTopic: platform === 'ios' ? process.env.APNS_TOPIC : undefined
};
const tokenStrings = tokens.map(t => t.token);
const results = await provider.sendNotification(tokenStrings, payload, options);
// Clean up invalid tokens
const invalidTokens = provider.getInvalidTokens(results);
if (invalidTokens.length > 0) {
await this.tokenStore.markTokensInvalid(userId, invalidTokens);
}
return {
platform,
tokens: tokenStrings,
results,
invalidTokens
};
}
private async createPlatformPayloads(
notification: NotificationEvent
): Promise<Record<PushPlatform, PushPayload>> {
// Get localized content based on user preferences
const template = await this.templateService.getTemplate(
notification.type,
'push',
'en' // Should be user's locale
);
const rendered = await this.templateService.render(template, notification.data);
return {
ios: {
title: rendered.subject || '',
body: rendered.body,
data: {
notificationId: notification.id,
type: notification.type,
...notification.data
},
badge: await this.getBadgeCount(notification.userId),
sound: this.getSoundForNotificationType(notification.type)
},
android: {
title: rendered.subject || '',
body: rendered.body,
data: {
notificationId: notification.id,
type: notification.type,
...notification.data
},
icon: 'ic_notification',
// Android-specific styling
color: '#007AFF'
},
web: {
title: rendered.subject || '',
body: rendered.body,
data: notification.data,
icon: '/icons/notification-icon.png',
image: notification.data.imageUrl
}
};
}
}
Push Token Management
Token management is where most push implementations fail in production. Tokens become invalid, users uninstall apps, and you need to handle this gracefully:
class PushTokenStore {
constructor(private db: Database, private redis: Redis) {}
async registerToken(
userId: string,
token: string,
platform: PushPlatform,
deviceId: string
): Promise<void> {
try {
// Validate token format
if (!this.isValidTokenFormat(token, platform)) {
throw new Error('Invalid token format');
}
// Check if token already exists for another user
const existingToken = await this.db.query(
'SELECT user_id FROM push_tokens WHERE token = $1',
[token]
);
if (existingToken.length > 0 && existingToken[0].user_id !== userId) {
// Token moved to new user, update it
await this.db.query(
'UPDATE push_tokens SET user_id = $1, updated_at = NOW() WHERE token = $2',
[userId, token]
);
} else {
// Insert or update token
await this.db.query(`
INSERT INTO push_tokens (user_id, token, platform, device_id, is_active, created_at, updated_at)
VALUES ($1, $2, $3, $4, true, NOW(), NOW())
ON CONFLICT (token)
DO UPDATE SET
user_id = $1,
is_active = true,
updated_at = NOW()
`, [userId, token, platform, deviceId]);
}
// Cache active tokens for quick lookup
await this.redis.sadd(`push_tokens:${userId}`, token);
console.log(`Registered push token for user ${userId} on ${platform}`);
} catch (error) {
console.error('Push token registration failed:', error);
throw error;
}
}
async markTokensInvalid(userId: string, tokens: string[]): Promise<void> {
if (tokens.length === 0) return;
await this.db.query(
'UPDATE push_tokens SET is_active = false, updated_at = NOW() WHERE token = ANY($1)',
[tokens]
);
// Remove from Redis cache
if (tokens.length > 0) {
await this.redis.srem(`push_tokens:${userId}`, ...tokens);
}
console.log(`Marked ${tokens.length} tokens as invalid for user ${userId}`);
}
async getUserTokens(userId: string): Promise<PushToken[]> {
// Try cache first
const cachedTokens = await this.redis.smembers(`push_tokens:${userId}`);
if (cachedTokens.length > 0) {
// Get full token info from database
const tokens = await this.db.query(`
SELECT token, platform, device_id, created_at
FROM push_tokens
WHERE user_id = $1 AND is_active = true AND token = ANY($2)
`, [userId, cachedTokens]);
return tokens;
}
// Cache miss, get from database and populate cache
const tokens = await this.db.query(`
SELECT token, platform, device_id, created_at
FROM push_tokens
WHERE user_id = $1 AND is_active = true
ORDER BY updated_at DESC
`, [userId]);
if (tokens.length > 0) {
await this.redis.sadd(
`push_tokens:${userId}`,
...tokens.map(t => t.token)
);
await this.redis.expire(`push_tokens:${userId}`, 86400); // 24 hours
}
return tokens;
}
}
Email Delivery: More Complex Than You Think
Email seems like the “easy” channel until you deal with deliverability, bounce handling, and vendor limits. Here’s the email service that’s handled millions of emails without ending up in spam folders:
Email Service with Provider Failover
Email providers can fail, get rate limited, or have deliverability issues. Multiple providers and smart routing provide necessary redundancy:
interface EmailProvider {
sendEmail(email: EmailMessage): Promise<EmailResult>;
handleWebhook(payload: any): Promise<WebhookResult>;
getDeliverabilityScore(): Promise<number>;
}
class EmailDeliveryService {
private providers: EmailProvider[] = [];
private primaryProvider: EmailProvider;
private fallbackProviders: EmailProvider[];
constructor() {
// Initialize providers in priority order
this.providers = [
new SendGridProvider(),
new AmazonSESProvider(),
new PostmarkProvider()
];
this.primaryProvider = this.providers[0];
this.fallbackProviders = this.providers.slice(1);
}
async sendEmail(
userId: string,
notification: NotificationEvent
): Promise<EmailDeliveryResult> {
try {
// Get user email and preferences
const user = await this.getUserWithEmailPrefs(userId);
if (!user.email || !user.emailEnabled) {
return {
success: false,
reason: 'email_disabled',
attempts: []
};
}
// Check if user is on suppression list
if (await this.isUserSuppressed(user.email)) {
return {
success: false,
reason: 'user_suppressed',
attempts: []
};
}
// Render email content
const emailContent = await this.renderEmailContent(notification, user);
// Prepare email message
const emailMessage: EmailMessage = {
to: user.email,
from: this.getFromAddress(notification.type),
subject: emailContent.subject,
html: emailContent.html,
text: emailContent.text,
metadata: {
userId,
notificationId: notification.id,
notificationType: notification.type
},
tags: [notification.type, `user:${userId}`],
unsubscribeUrl: this.generateUnsubscribeUrl(userId, notification.type)
};
// Try primary provider first
let result = await this.attemptDelivery(this.primaryProvider, emailMessage);
if (!result.success) {
// Try fallback providers
for (const provider of this.fallbackProviders) {
console.warn(`Primary email provider failed, trying fallback: ${provider.constructor.name}`);
result = await this.attemptDelivery(provider, emailMessage);
if (result.success) break;
}
}
// Store delivery result
await this.storeDeliveryResult(notification.id, 'email', result);
return {
success: result.success,
attempts: [result],
providerId: result.providerId,
messageId: result.messageId
};
} catch (error) {
console.error('Email delivery failed:', error);
return {
success: false,
reason: 'delivery_error',
error: error.message,
attempts: []
};
}
}
private async attemptDelivery(
provider: EmailProvider,
email: EmailMessage
): Promise<EmailAttemptResult> {
const startTime = Date.now();
try {
const result = await provider.sendEmail(email);
const duration = Date.now() - startTime;
return {
providerId: provider.constructor.name,
success: result.success,
messageId: result.messageId,
duration,
response: result.response
};
} catch (error) {
const duration = Date.now() - startTime;
return {
providerId: provider.constructor.name,
success: false,
duration,
error: error.message,
shouldRetry: this.isRetryableError(error)
};
}
}
private async renderEmailContent(
notification: NotificationEvent,
user: User
): Promise<EmailContent> {
// Get email template
const template = await this.templateService.getTemplate(
notification.type,
'email',
user.locale
);
// Render with user data and notification data
const context = {
user,
...notification.data,
unsubscribeUrl: this.generateUnsubscribeUrl(user.id, notification.type),
preferencesUrl: this.generatePreferencesUrl(user.id)
};
const rendered = await this.templateService.render(template, context);
// Convert markdown to HTML if needed
const html = this.markdownToHtml(rendered.body);
const text = this.htmlToText(html);
return {
subject: rendered.subject,
html,
text
};
}
}
Email Bounce and Complaint Handling
Handling bounces, complaints, and unsubscribes properly is critical for deliverability:
class EmailWebhookHandler {
constructor(
private db: Database,
private suppressionService: SuppressionService
) {}
async handleWebhook(
provider: string,
payload: any
): Promise<WebhookProcessResult> {
try {
const events = this.parseProviderWebhook(provider, payload);
for (const event of events) {
await this.processEmailEvent(event);
}
return { success: true, eventsProcessed: events.length };
} catch (error) {
console.error('Webhook processing failed:', error);
return { success: false, error: error.message };
}
}
private async processEmailEvent(event: EmailEvent): Promise<void> {
// Update delivery record
await this.db.query(`
UPDATE notification_deliveries
SET status = $1, delivered_at = $2, error_message = $3, provider_response = $4
WHERE provider_id = $5
`, [
event.status,
event.timestamp,
event.error,
JSON.stringify(event.rawData),
event.messageId
]);
// Handle specific event types
switch (event.type) {
case 'bounce':
await this.handleBounce(event);
break;
case 'complaint':
await this.handleComplaint(event);
break;
case 'unsubscribe':
await this.handleUnsubscribe(event);
break;
case 'delivered':
await this.handleDelivery(event);
break;
}
}
private async handleBounce(event: EmailEvent): Promise<void> {
const bounceType = event.bounceType || 'unknown';
if (bounceType === 'permanent') {
// Suppress permanently bounced email
await this.suppressionService.addSuppression({
email: event.recipient,
reason: 'permanent_bounce',
source: 'webhook',
metadata: {
bounceSubType: event.bounceSubType,
messageId: event.messageId
}
});
console.log(`Permanently suppressed ${event.recipient} due to hard bounce`);
} else if (bounceType === 'temporary') {
// Track temporary bounces, suppress after threshold
const bounceCount = await this.incrementBounceCount(event.recipient);
if (bounceCount >= 5) {
await this.suppressionService.addSuppression({
email: event.recipient,
reason: 'repeated_soft_bounces',
source: 'auto_suppression',
metadata: { bounceCount }
});
console.log(`Auto-suppressed ${event.recipient} after ${bounceCount} soft bounces`);
}
}
}
private async handleComplaint(event: EmailEvent): Promise<void> {
// Immediately suppress users who mark emails as spam
await this.suppressionService.addSuppression({
email: event.recipient,
reason: 'spam_complaint',
source: 'webhook',
metadata: {
messageId: event.messageId,
complaintType: event.complaintSubType
}
});
// Also add to global suppression list
await this.db.query(`
UPDATE users SET email_enabled = false
WHERE email = $1
`, [event.recipient]);
console.log(`Suppressed ${event.recipient} due to spam complaint`);
}
}
SMS and Webhook Channels: The Supporting Cast
SMS and webhooks round out the multi-channel approach. Here’s how to implement them reliably:
SMS Delivery Service
SMS is the “nuclear option” for critical notifications. Keep it simple and reliable:
class SMSDeliveryService {
private provider: SMSProvider;
private fallbackProvider: SMSProvider;
constructor() {
this.provider = new TwilioProvider();
this.fallbackProvider = new AmazonSNSProvider();
}
async sendSMS(
userId: string,
notification: NotificationEvent
): Promise<SMSDeliveryResult> {
try {
const user = await this.getUserWithSMSPrefs(userId);
if (!user.phone || !user.smsEnabled) {
return { success: false, reason: 'sms_disabled' };
}
// SMS content should be concise
const content = await this.renderSMSContent(notification, user);
// Try primary provider
let result = await this.provider.sendSMS({
to: user.phone,
message: content,
metadata: {
userId,
notificationId: notification.id
}
});
if (!result.success) {
// Try fallback
result = await this.fallbackProvider.sendSMS({
to: user.phone,
message: content,
metadata: { userId, notificationId: notification.id }
});
}
await this.storeDeliveryResult(notification.id, 'sms', result);
return result;
} catch (error) {
console.error('SMS delivery failed:', error);
return { success: false, error: error.message };
}
}
private async renderSMSContent(
notification: NotificationEvent,
user: User
): Promise<string> {
const template = await this.templateService.getTemplate(
notification.type,
'sms',
user.locale
);
const rendered = await this.templateService.render(template, {
user,
...notification.data
});
// SMS has character limits
return this.truncateForSMS(rendered.body, 160);
}
}
Webhook Delivery for Integrations
Webhooks are how you integrate with external systems. Make them reliable and well-documented:
class WebhookDeliveryService {
private httpClient: HTTPClient;
private retryQueue: Queue;
constructor() {
this.httpClient = new HTTPClient({
timeout: 10000,
retries: 3,
retryDelay: 1000
});
}
async sendWebhook(
userId: string,
notification: NotificationEvent
): Promise<WebhookDeliveryResult> {
try {
// Get user's webhook configurations
const webhookConfigs = await this.getWebhookConfigs(userId, notification.type);
if (webhookConfigs.length === 0) {
return { success: false, reason: 'no_webhooks_configured' };
}
// Send to each configured webhook
const deliveryPromises = webhookConfigs.map(config =>
this.deliverToWebhook(config, notification)
);
const results = await Promise.allSettled(deliveryPromises);
const successful = results.filter(r =>
r.status === 'fulfilled' && r.value.success
).length;
return {
success: successful > 0,
delivered: successful,
total: webhookConfigs.length,
results: results.map(r =>
r.status === 'fulfilled' ? r.value : { success: false, error: r.reason }
)
};
} catch (error) {
console.error('Webhook delivery failed:', error);
return { success: false, error: error.message };
}
}
private async deliverToWebhook(
config: WebhookConfig,
notification: NotificationEvent
): Promise<WebhookResult> {
const payload = {
id: notification.id,
type: notification.type,
userId: notification.userId,
data: notification.data,
timestamp: notification.scheduledAt || new Date(),
signature: this.generateSignature(config.secret, notification)
};
try {
const response = await this.httpClient.post(config.url, payload, {
headers: {
'Content-Type': 'application/json',
'X-Webhook-Signature': payload.signature,
'User-Agent': 'NotificationSystem/1.0'
}
});
return {
success: response.status >= 200 && response.status < 300,
statusCode: response.status,
response: response.data
};
} catch (error) {
return {
success: false,
error: error.message,
shouldRetry: this.isRetryableHttpError(error)
};
}
}
}
Channel Coordination and Delivery Logic
The real complexity comes from coordinating across channels intelligently:
class MultiChannelDeliveryOrchestrator {
async deliverNotification(notification: NotificationEvent): Promise<void> {
// Get user preferences for this notification type
const preferences = await this.preferenceManager
.getEnabledChannels(notification.userId, notification.type);
if (preferences.length === 0) {
await this.analytics.trackSkipped(notification.id, 'no_enabled_channels');
return;
}
// Apply delivery rules based on notification priority and type
const deliveryPlan = this.createDeliveryPlan(notification, preferences);
// Execute delivery plan
const results = await this.executeDeliveryPlan(deliveryPlan);
// Track overall delivery success
await this.analytics.trackMultiChannelDelivery(notification.id, results);
}
private createDeliveryPlan(
notification: NotificationEvent,
enabledChannels: NotificationChannel[]
): DeliveryPlan {
const plan: DeliveryPlan = {
immediate: [],
delayed: [],
conditional: []
};
// Critical notifications go to all channels immediately
if (notification.priority === 'critical') {
plan.immediate = enabledChannels;
return plan;
}
// Normal notifications follow user preferences and smart rules
for (const channel of enabledChannels) {
if (channel === 'in_app' || channel === 'push') {
plan.immediate.push(channel);
} else if (channel === 'email') {
// Email can be delayed for batching
plan.delayed.push({
channel,
delay: this.getEmailBatchDelay(notification.userId)
});
} else {
plan.conditional.push({
channel,
condition: this.getDeliveryCondition(channel, notification)
});
}
}
return plan;
}
}
Key Delivery Lessons
Debugging WebSocket connection storms, email deliverability crises, and other production issues reveals these important patterns:
-
Connections are ephemeral: Build your WebSocket infrastructure assuming connections will drop. Store critical state outside the connection.
-
Push tokens expire: Have a robust token management system that handles invalid tokens gracefully and re-registers tokens when needed.
-
Email deliverability is an art: Multiple providers, proper bounce handling, and suppression lists aren’t optional - they’re survival necessities.
-
Every channel has rate limits: Build your system to respect provider limits and implement intelligent backoff strategies.
-
Users change their minds: Make it easy to update preferences and handle opt-outs immediately. Your deliverability depends on it.
-
Monitor everything: Each channel needs specific monitoring. WebSocket connection counts, push delivery rates, email bounce rates, SMS costs - track them all.
In the next part of this series, we’ll explore production debugging and monitoring strategies that work when notification systems fail during critical business moments.
The multi-channel delivery system covered here handles the happy path well, but the real test comes when things go wrong. In notification systems, failures are inevitable.
Building a Scalable User Notification System
A comprehensive 4-part series covering the design, implementation, and production challenges of building enterprise-grade notification systems. From architecture and database design to real-time delivery, debugging at scale, and performance optimization.
All posts in this series
Related posts
A comprehensive guide to building scalable real-time APIs with AWS AppSync, covering JavaScript resolvers, subscription filtering, caching strategies, and infrastructure as code patterns.
Building a RAG agent on AWS Bedrock + Knowledge Bases + OpenSearch Serverless with CDK in TypeScript — architecture, IAM wiring, automated ingestion, and the chat UI.
A CDK guide for deploying a minimal Strands agent on AgentCore Runtime — parameterized stack, arm64 build, deploy and invoke, and the IAM and Marketplace prerequisites you need before the first call.
DI containers, monolithic SDKs, god-handlers, top-level secret fetches, and heavy ORMs - what they cost on cold start, and the functional shape that replaces them.
One default shape for long-running work across a browser SPA and a mobile app, with the cases where it should be overridden.