Skip to content

2025-09-04

OpenTelemetry in React Native: Building Production-Ready Observability

A comprehensive guide to implementing OpenTelemetry in React Native applications with Firebase integration and enterprise APM solutions for production monitoring.

Many React Native teams struggle with production visibility. Crashes that can’t be reproduced locally, random performance issues, and user complaints without supporting data are common challenges. This guide covers implementing comprehensive observability that provides the insights needed for production troubleshooting and optimization.

The Challenge: Mobile Observability Requirements

Production mobile applications have unique monitoring challenges that differ from web applications. Silent failures, device-specific issues, and network variability create blind spots that traditional logging approaches can’t address.

Key requirements for mobile observability include:

  1. Comprehensive visibility into user interactions and system behavior
  2. Device and OS-specific context for debugging
  3. Performance monitoring that accounts for mobile constraints
  4. Offline-capable telemetry collection

This guide demonstrates building a monitoring system that addresses these challenges.

Why OpenTelemetry for React Native

OpenTelemetry provides several advantages for React Native observability:

Alternative Solutions Comparison

Firebase Performance Monitoring

  • Pros: Easy setup, free tier, basic metrics
  • Cons: Limited customization, no distributed tracing, vendor lock-in

Datadog RUM

  • Pros: Rich dashboards, comprehensive alerting, real user monitoring
  • Cons: Higher cost, limited React Native-specific features

New Relic Mobile

  • Pros: Established platform, good analytics
  • Cons: Performance overhead, React Native documentation gaps

Sentry Performance

  • Pros: Strong error tracking foundation
  • Cons: Limited mobile-specific monitoring capabilities

OpenTelemetry addresses these limitations:

  • Vendor independence: Switch monitoring providers without code changes
  • Standardized data: Consistent format for traces, metrics, logs
  • Rich ecosystem: Compatible with multiple backends
  • Future-proof: Industry standard backed by CNCF
  • Production-ready: Reliable performance in mobile environments

Production Architecture for Scale

Here’s a scalable production architecture for handling high-volume telemetry:

Export Layer

Collection Layer

React Native App Layer

React Native App

iOS & Android

Auto Instrumentation

Manual Tracking

Custom Metrics

OpenTelemetry SDK

Smart Batching

Adaptive Sampling

Firebase Performance

Datadog (Primary)

Elastic APM (Backup)

Production-Ready Implementation

Here’s a proven implementation for React Native observability:

Core OpenTelemetry Setup

// telemetry/provider.ts - Production-ready OpenTelemetry setup
import { WebTracerProvider } from '@opentelemetry/sdk-trace-web';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { MeterProvider, PeriodicExportingMetricReader } from '@opentelemetry/sdk-metrics';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
import { XMLHttpRequestInstrumentation } from '@opentelemetry/instrumentation-xml-http-request';
import { FetchInstrumentation } from '@opentelemetry/instrumentation-fetch';
import { Platform } from 'react-native';
import DeviceInfo from 'react-native-device-info';

interface TelemetryConfig {
  environment: 'development' | 'staging' | 'production';
  enabledExporters: string[];
  samplingRate: number;
  maxBatchSize: number;
  exportInterval: number;
}

class ProductionTelemetryProvider {
  private tracerProvider: WebTracerProvider | null = null;
  private meterProvider: MeterProvider | null = null;
  private isInitialized = false;

  async initialize(config: TelemetryConfig) {
    if (this.isInitialized) {
      console.warn('Telemetry already initialized');
      return;
    }

    try {
      const deviceInfo = await this.getDeviceInfo();

      const resource = new Resource({
        [SemanticResourceAttributes.SERVICE_NAME]: 'my-react-native-app',
        [SemanticResourceAttributes.SERVICE_VERSION]: deviceInfo.appVersion,
        [SemanticResourceAttributes.DEPLOYMENT_ENVIRONMENT]: config.environment,
        // Mobile-specific attributes for enhanced debugging
        'mobile.platform': Platform.OS,
        'mobile.platform.version': deviceInfo.systemVersion,
        'device.model': deviceInfo.deviceId,
        'device.manufacturer': deviceInfo.brand,
        'app.build': deviceInfo.buildNumber,
        'app.bundle_id': deviceInfo.bundleId,
        // Network and device context
        'network.carrier': deviceInfo.carrier,
        'device.memory': deviceInfo.totalMemory,
      });

      // Multiple exporters for redundancy and reliability
      const exporters = this.createExporters(config);

      // Initialize tracer provider
      this.tracerProvider = new WebTracerProvider({
        resource,
        sampler: this.createAdaptiveSampler(config.samplingRate),
      });

      // Add span processors
      exporters.spanProcessors.forEach(processor => {
        this.tracerProvider!.addSpanProcessor(processor);
      });

      // Initialize meter provider
      this.meterProvider = new MeterProvider({
        resource,
        readers: [new PeriodicExportingMetricReader({
          exporter: exporters.metricExporter,
          exportIntervalMillis: config.exportInterval,
        })],
      });

      // Register instrumentations
      registerInstrumentations({
        instrumentations: [
          new XMLHttpRequestInstrumentation(),
          new FetchInstrumentation(),
        ],
      });
      this.isInitialized = true;

      console.log('Production telemetry initialized', {
        environment: config.environment,
        exporters: config.enabledExporters,
        samplingRate: config.samplingRate,
      });

    } catch (error) {
      console.error('Failed to initialize telemetry:', error);
      // Graceful degradation - app continues if telemetry fails
    }
  }

  private async getDeviceInfo() {
    // Gather all device info in parallel for faster startup
    const [
      appVersion,
      buildNumber,
      bundleId,
      deviceId,
      brand,
      systemVersion,
      carrier,
      totalMemory,
    ] = await Promise.all([
      DeviceInfo.getVersion(),
      DeviceInfo.getBuildNumber(),
      DeviceInfo.getBundleId(),
      DeviceInfo.getUniqueId(),
      DeviceInfo.getBrand(),
      DeviceInfo.getSystemVersion(),
      DeviceInfo.getCarrier().catch(() => 'unknown'),
      DeviceInfo.getTotalMemory().catch(() => 0),
    ]);

    return {
      appVersion,
      buildNumber,
      bundleId,
      deviceId,
      brand,
      systemVersion,
      carrier,
      totalMemory,
    };
  }

  private createExporters(config: TelemetryConfig) {
    const spanProcessors: any[] = [];
    let metricExporter: any = null;

    // Primary exporter - Datadog for rich analytics
    if (config.enabledExporters.includes('datadog')) {
      const datadogExporter = new DatadogExporter({
        apiKey: process.env.DATADOG_API_KEY!,
        service: 'mobile-app',
        env: config.environment,
      });

      spanProcessors.push(new BatchSpanProcessor(datadogExporter, {
        maxExportBatchSize: config.maxBatchSize,
        scheduledDelayMillis: config.exportInterval,
        // Aggressive timeout to prevent memory buildup
        exportTimeoutMillis: 10000,
      }));

      metricExporter = datadogExporter;
    }

    // Secondary exporter - Firebase for basic monitoring
    if (config.enabledExporters.includes('firebase')) {
      spanProcessors.push(new BatchSpanProcessor(new FirebaseExporter(), {
        maxExportBatchSize: 50, // Smaller batches for Firebase
        scheduledDelayMillis: 30000, // Less frequent for free tier
      }));
    }

    return { spanProcessors, metricExporter };
  }

  private createAdaptiveSampler(baseRate: number) {
    // Custom sampler that reduces sampling under stress
    return {
      shouldSample: (context: any, traceId: string, spanName: string) => {
        // Always sample errors
        if (spanName.includes('error') || spanName.includes('crash')) {
          return { decision: 1 }; // RECORD_AND_SAMPLE
        }

        // Sample critical user flows at higher rate
        if (spanName.includes('payment') || spanName.includes('login')) {
          return { decision: Math.random() < (baseRate * 2) ? 1 : 0 };
        }

        // Reduced sampling for high-frequency events
        if (spanName.includes('scroll') || spanName.includes('animation')) {
          return { decision: Math.random() < (baseRate * 0.1) ? 1 : 0 };
        }

        return { decision: Math.random() < baseRate ? 1 : 0 };
      },
    };
  }

  async shutdown() {
    if (this.sdk && this.isInitialized) {
      await this.sdk.shutdown();
      this.isInitialized = false;
    }
  }
}

export const telemetryProvider = new ProductionTelemetryProvider();

React Native Performance Monitoring

Here’s a comprehensive performance monitoring implementation:

// telemetry/performance-monitor.ts - Production performance monitoring
import { trace, metrics, context } from '@opentelemetry/api';
import perf from '@react-native-firebase/perf';
import { AppState, AppStateStatus } from 'react-native';

class ProductionPerformanceMonitor {
  private tracer = trace.getTracer('app-performance', '1.0.0');
  private meter = metrics.getMeter('app-metrics', '1.0.0');

  // Metrics that actually matter in production
  private screenLoadTime = this.meter.createHistogram('screen_load_duration', {
    description: 'Time to load screens',
    unit: 'ms',
  });

  private apiCallDuration = this.meter.createHistogram('api_call_duration', {
    description: 'API response times by endpoint',
    unit: 'ms',
  });

  private userJourneyCompletion = this.meter.createCounter('user_journey_completion', {
    description: 'Completed user journeys',
  });

  private criticalErrors = this.meter.createCounter('critical_errors', {
    description: 'Errors that affect core functionality',
  });

  constructor() {
    this.setupAppStateTracking();
  }

  // Track screen loads with actual business impact
  async measureScreenLoad<T>(
    screenName: string,
    loadFunction: () => Promise<T>,
    isBusinessCritical = false
  ): Promise<T> {
    const span = this.tracer.startSpan(`screen_load_${screenName}`);
    const startTime = Date.now();

    // Firebase trace for free monitoring
    let firebaseTrace: any = null;
    try {
      firebaseTrace = perf().newTrace(`screen_${screenName}`);
      firebaseTrace.start();
    } catch (error) {
      // Firebase can fail, don't crash the app
      console.warn('Firebase trace failed:', error);
    }

    span.setAttributes({
      'screen.name': screenName,
      'screen.business_critical': isBusinessCritical,
      'screen.timestamp': startTime,
    });

    try {
      const result = await loadFunction();
      const duration = Date.now() - startTime;

      // Record metrics
      this.screenLoadTime.record(duration, {
        screen: screenName,
        success: 'true',
        critical: isBusinessCritical.toString(),
      });

      // Alert on slow critical screens
      if (isBusinessCritical && duration > 3000) {
        this.criticalErrors.add(1, {
          type: 'slow_critical_screen',
          screen: screenName,
          duration: duration.toString(),
        });
      }

      span.setAttributes({
        'screen.load_duration': duration,
        'screen.success': true,
      });

      span.setStatus({ code: 1 }); // OK

      return result;
    } catch (error) {
      const duration = Date.now() - startTime;

      this.screenLoadTime.record(duration, {
        screen: screenName,
        success: 'false',
        error: error.name,
      });

      // Always alert on screen load failures
      this.criticalErrors.add(1, {
        type: 'screen_load_failure',
        screen: screenName,
        error: error.message,
      });

      span.recordException(error);
      span.setStatus({ code: 2, message: error.message });

      firebaseTrace?.putAttribute('error', 'true');

      throw error;
    } finally {
      span.end();
      firebaseTrace?.stop();
    }
  }

  // API monitoring for business-critical flows
  async instrumentApiCall<T>(
    endpoint: string,
    method: string,
    apiCall: () => Promise<T>,
    businessContext?: {
      userId?: string;
      feature?: string;
      monetaryValue?: number;
    }
  ): Promise<T> {
    const span = this.tracer.startSpan(`api_${method.toLowerCase()}_${this.sanitizeEndpoint(endpoint)}`);
    const startTime = Date.now();

    span.setAttributes({
      'http.method': method,
      'http.url': endpoint,
      'api.business_context': JSON.stringify(businessContext || {}),
      'api.timestamp': startTime,
    });

    try {
      const result = await apiCall();
      const duration = Date.now() - startTime;

      this.apiCallDuration.record(duration, {
        endpoint: this.sanitizeEndpoint(endpoint),
        method,
        status: 'success',
        business_critical: businessContext?.monetaryValue ? 'true' : 'false',
      });

      // Alert on slow payment APIs
      if (businessContext?.monetaryValue && duration > 5000) {
        this.criticalErrors.add(1, {
          type: 'slow_payment_api',
          endpoint: this.sanitizeEndpoint(endpoint),
          duration: duration.toString(),
          value: businessContext.monetaryValue.toString(),
        });
      }

      span.setAttributes({
        'http.status_code': 200,
        'http.response_time': duration,
        'api.success': true,
      });

      return result;
    } catch (error) {
      const duration = Date.now() - startTime;

      this.apiCallDuration.record(duration, {
        endpoint: this.sanitizeEndpoint(endpoint),
        method,
        status: 'error',
        error_type: error.name,
      });

      // Always alert on payment API failures
      if (businessContext?.monetaryValue) {
        this.criticalErrors.add(1, {
          type: 'payment_api_failure',
          endpoint: this.sanitizeEndpoint(endpoint),
          error: error.message,
          user_id: businessContext.userId || 'unknown',
          value: businessContext.monetaryValue.toString(),
        });
      }

      span.recordException(error);
      span.setAttributes({
        'http.status_code': error.status || 500,
        'error.name': error.name,
        'error.message': error.message,
        'api.success': false,
      });

      throw error;
    } finally {
      span.end();
    }
  }

  // Track complete user journeys, not just individual actions
  startUserJourney(journeyName: string, userId?: string): string {
    const journeyId = `${journeyName}_${Date.now()}_${Math.random().toString(36).substr(2, 9)}`;

    const span = this.tracer.startSpan(`user_journey_${journeyName}`, {
      attributes: {
        'journey.name': journeyName,
        'journey.id': journeyId,
        'user.id': userId || 'anonymous',
        'journey.start_time': Date.now(),
      },
    });

    // Store in context for later steps
    context.with(trace.setSpan(context.active(), span), () => {
      // Context is now available for subsequent operations
    });

    return journeyId;
  }

  completeUserJourney(journeyId: string, success: boolean, metadata?: Record<string, any>) {
    const activeSpan = trace.getActiveSpan();

    if (activeSpan) {
      activeSpan.setAttributes({
        'journey.completed': success,
        'journey.end_time': Date.now(),
        ...metadata,
      });

      if (success) {
        this.userJourneyCompletion.add(1, {
          journey: activeSpan.attributes['journey.name'] as string || 'unknown',
          success: 'true',
        });
      } else {
        this.criticalErrors.add(1, {
          type: 'journey_failure',
          journey: activeSpan.attributes['journey.name'] as string || 'unknown',
          step: metadata?.failedStep || 'unknown',
        });
      }

      activeSpan.setStatus({
        code: success ? 1 : 2,
        message: success ? 'Journey completed' : 'Journey failed',
      });

      activeSpan.end();
    }
  }

  private sanitizeEndpoint(endpoint: string): string {
    // Remove sensitive data from endpoints for metrics
    return endpoint
      .replace(/\/\d+/g, '/:id')
      .replace(/[?&]token=[^&]*/g, '?token=***')
      .replace(/[?&]api_key=[^&]*/g, '?api_key=***');
  }

  private setupAppStateTracking() {
    let backgroundTime = 0;

    AppState.addEventListener('change', (nextAppState: AppStateStatus) => {
      if (nextAppState === 'background') {
        backgroundTime = Date.now();

        // Force flush telemetry before backgrounding
        this.flushTelemetry();
      } else if (nextAppState === 'active' && backgroundTime > 0) {
        const backgroundDuration = Date.now() - backgroundTime;

        // Track app resume
        const resumeSpan = this.tracer.startSpan('app_resume');
        resumeSpan.setAttributes({
          'app.background_duration': backgroundDuration,
          'app.resume_time': Date.now(),
        });
        resumeSpan.end();

        backgroundTime = 0;
      }
    });
  }

  private async flushTelemetry() {
    try {
      // Force export of pending telemetry data
      await telemetryProvider.sdk?.getTracerProvider()?.forceFlush(5000);
    } catch (error) {
      console.warn('Failed to flush telemetry:', error);
    }
  }
}

export const performanceMonitor = new ProductionPerformanceMonitor();

Standard navigation tracking is useless. This tracks what actually matters:

// telemetry/navigation-instrumentation.ts - Navigation tracking that matters
import { NavigationContainer, NavigationContainerRef } from '@react-navigation/native';
import { trace, metrics } from '@opentelemetry/api';
import React, { useRef, useCallback } from 'react';

const tracer = trace.getTracer('navigation', '1.0.0');
const meter = metrics.getMeter('navigation-metrics', '1.0.0');

// Metrics that help optimize user experience
const screenTransitionTime = meter.createHistogram('screen_transition_duration', {
  description: 'Time between screen transitions',
  unit: 'ms',
});

const navigationDropoff = meter.createCounter('navigation_dropoff', {
  description: 'Users who drop off at specific screens',
});

const deepLinkUsage = meter.createCounter('deep_link_usage', {
  description: 'Deep link navigation usage',
});

interface NavigationEvent {
  from: string;
  to: string;
  params?: any;
  timestamp: number;
  userId?: string;
}

class NavigationTelemetry {
  private navigationHistory: NavigationEvent[] = [];
  private maxHistorySize = 50;

  trackNavigation(event: NavigationEvent) {
    // Add to history
    this.navigationHistory.push(event);
    if (this.navigationHistory.length > this.maxHistorySize) {
      this.navigationHistory.shift();
    }

    // Create span for navigation
    const span = tracer.startSpan('screen_navigation');
    span.setAttributes({
      'navigation.from': event.from,
      'navigation.to': event.to,
      'navigation.params': JSON.stringify(event.params || {}),
      'navigation.timestamp': event.timestamp,
      'user.id': event.userId || 'anonymous',
    });

    // Record metrics
    if (this.navigationHistory.length > 1) {
      const previousEvent = this.navigationHistory[this.navigationHistory.length - 2];
      const transitionTime = event.timestamp - previousEvent.timestamp;

      screenTransitionTime.record(transitionTime, {
        from: event.from,
        to: event.to,
      });

      // Track quick exits (user confusion indicator)
      if (transitionTime < 2000) {
        navigationDropoff.add(1, {
          screen: event.from,
          quick_exit: 'true',
          time_spent: transitionTime.toString(),
        });
      }
    }

    // Track deep link usage
    if (event.params && Object.keys(event.params).length > 0) {
      deepLinkUsage.add(1, {
        screen: event.to,
        has_params: 'true',
      });
    }

    span.end();
  }

  getNavigationPath(): string[] {
    return this.navigationHistory.map(event => event.to);
  }

  analyzeFunnelDropoff(): Record<string, number> {
    const dropoffRates: Record<string, number> = {};

    for (let i = 0; i < this.navigationHistory.length - 1; i++) {
      const current = this.navigationHistory[i];
      const next = this.navigationHistory[i + 1];

      const timeSpent = next.timestamp - current.timestamp;
      if (timeSpent < 5000) { // Less than 5 seconds = potential confusion
        dropoffRates[current.to] = (dropoffRates[current.to] || 0) + 1;
      }
    }

    return dropoffRates;
  }
}

const navigationTelemetry = new NavigationTelemetry();

export function createTelemetryNavigationContainer() {
  return React.forwardRef<NavigationContainerRef<any>, any>((props, ref) => {
    const navigationRef = useRef<NavigationContainerRef<any>>(null);
    const routeNameRef = useRef<string>();
    const navigationStartTime = useRef<number>();

    const onReady = useCallback(() => {
      const initialRoute = navigationRef.current?.getCurrentRoute();
      routeNameRef.current = initialRoute?.name;

      if (initialRoute?.name) {
        navigationTelemetry.trackNavigation({
          from: 'app_start',
          to: initialRoute.name,
          params: initialRoute.params,
          timestamp: Date.now(),
        });
      }
    }, []);

    const onStateChange = useCallback(() => {
      const previousRouteName = routeNameRef.current;
      const currentRoute = navigationRef.current?.getCurrentRoute();
      const currentRouteName = currentRoute?.name;

      if (previousRouteName !== currentRouteName && currentRouteName) {
        const now = Date.now();

        navigationTelemetry.trackNavigation({
          from: previousRouteName || 'unknown',
          to: currentRouteName,
          params: currentRoute.params,
          timestamp: now,
        });

        routeNameRef.current = currentRouteName;
      }
    }, []);

    return (
      <NavigationContainer
        ref={ref || navigationRef}
        onReady={onReady}
        onStateChange={onStateChange}
        {...props}
      />
    );
  });
}

export { navigationTelemetry };

Error Tracking That Actually Catches Issues

Standard error tracking misses the context you need. This captures what you need to fix bugs:

// telemetry/error-tracking.ts - Error tracking that helps debugging
import { trace, context } from '@opentelemetry/api';
import crashlytics from '@react-native-firebase/crashlytics';

interface ErrorContext {
  userId?: string;
  screenName?: string;
  userJourney?: string[];
  networkState?: string;
  memoryUsage?: number;
  batteryLevel?: number;
  businessContext?: {
    feature?: string;
    monetaryValue?: number;
    customerTier?: string;
  };
}

class ProductionErrorTracker {
  private tracer = trace.getTracer('error-tracking', '1.0.0');
  private errorCount = 0;
  private recentErrors: Array<{ error: Error; context?: ErrorContext; timestamp: number }> = [];

  captureError(error: Error, errorContext?: ErrorContext) {
    const timestamp = Date.now();
    this.errorCount++;

    // Store recent errors for pattern analysis
    this.recentErrors.push({ error, context: errorContext, timestamp });
    if (this.recentErrors.length > 100) {
      this.recentErrors.shift();
    }

    // Create comprehensive error span
    const span = this.tracer.startSpan('error_occurred');

    span.setAttributes({
      'error.type': error.name,
      'error.message': error.message,
      'error.stack': this.sanitizeStack(error.stack || ''),
      'error.timestamp': timestamp,
      'error.sequence_number': this.errorCount,
      // Device context
      'device.memory_usage': errorContext?.memoryUsage || 0,
      'device.battery_level': errorContext?.batteryLevel || 1,
      'device.network_state': errorContext?.networkState || 'unknown',
      // User context
      'user.id': errorContext?.userId || 'anonymous',
      'user.screen': errorContext?.screenName || 'unknown',
      'user.journey': JSON.stringify(errorContext?.userJourney || []),
      // Business context
      'business.feature': errorContext?.businessContext?.feature || 'unknown',
      'business.monetary_value': errorContext?.businessContext?.monetaryValue || 0,
      'business.customer_tier': errorContext?.businessContext?.customerTier || 'unknown',
    });

    // Enhanced Firebase Crashlytics logging
    try {
      if (errorContext?.userId) {
        crashlytics().setUserId(errorContext.userId);
      }

      // Set custom attributes for better filtering
      crashlytics().setAttributes({
        screen_name: errorContext?.screenName || 'unknown',
        network_state: errorContext?.networkState || 'unknown',
        business_feature: errorContext?.businessContext?.feature || 'unknown',
        customer_tier: errorContext?.businessContext?.customerTier || 'unknown',
        error_sequence: this.errorCount.toString(),
      });

      // Add breadcrumbs from user journey
      if (errorContext?.userJourney) {
        errorContext.userJourney.forEach((step, index) => {
          crashlytics().log(`Journey step ${index + 1}: ${step}`);
        });
      }

      crashlytics().recordError(error);
    } catch (crashlyticsError) {
      console.warn('Crashlytics logging failed:', crashlyticsError);
    }

    // Pattern detection
    this.detectErrorPatterns();

    // Add to current span context if available
    const activeSpan = trace.getActiveSpan();
    if (activeSpan) {
      activeSpan.recordException(error);
      activeSpan.setStatus({
        code: 2, // ERROR
        message: error.message,
      });
    }

    span.end();

    // Log for immediate debugging
    console.error('Production error captured:', {
      error: error.message,
      context: errorContext,
      sequence: this.errorCount,
    });
  }

  // Detect error patterns that indicate systemic issues
  private detectErrorPatterns() {
    const recentWindow = Date.now() - 5 * 60 * 1000; // Last 5 minutes
    const recentErrors = this.recentErrors.filter(e => e.timestamp > recentWindow);

    if (recentErrors.length >= 5) {
      // Check for error storm
      const errorTypes = new Map<string, number>();
      recentErrors.forEach(({ error }) => {
        errorTypes.set(error.name, (errorTypes.get(error.name) || 0) + 1);
      });

      errorTypes.forEach((count, errorType) => {
        if (count >= 3) {
          this.reportErrorPattern('error_storm', {
            error_type: errorType,
            count: count.toString(),
            time_window: '5_minutes',
          });
        }
      });
    }

    // Check for user-specific issues
    const userErrors = new Map<string, number>();
    recentErrors.forEach(({ context }) => {
      if (context?.userId) {
        userErrors.set(context.userId, (userErrors.get(context.userId) || 0) + 1);
      }
    });

    userErrors.forEach((count, userId) => {
      if (count >= 3) {
        this.reportErrorPattern('user_error_cluster', {
          user_id: userId,
          count: count.toString(),
        });
      }
    });
  }

  private reportErrorPattern(patternType: string, attributes: Record<string, string>) {
    const span = this.tracer.startSpan(`error_pattern_${patternType}`);
    span.setAttributes({
      'pattern.type': patternType,
      'pattern.timestamp': Date.now(),
      ...attributes,
    });
    span.end();

    console.warn(`Error pattern detected: ${patternType}`, attributes);
  }

  private sanitizeStack(stack: string): string {
    // Remove sensitive information from stack traces
    return stack
      .replace(/token=[^&\s]*/g, 'token=***')
      .replace(/apikey=[^&\s]*/g, 'apikey=***')
      .replace(/password=[^&\s]*/g, 'password=***');
  }

  // Global error handlers that saved production
  setupGlobalErrorHandling() {
    // React Native JS errors
    const originalHandler = ErrorUtils.getGlobalHandler();
    ErrorUtils.setGlobalHandler((error, isFatal) => {
      this.captureError(error, {
        businessContext: { feature: 'global_js_error' },
      });

      // Don't prevent the original handler from running
      originalHandler(error, isFatal);
    });

    // Promise rejections
    const originalRejectionHandler = require('react-native/Libraries/Core/ExceptionsManager').installConsoleErrorReporter;

    // Unhandled promise rejections
    global.addEventListener?.('unhandledrejection', (event: any) => {
      this.captureError(
        new Error(`Unhandled Promise Rejection: ${event.reason}`),
        {
          businessContext: { feature: 'unhandled_promise' },
        }
      );
    });

    console.log('Global error handlers installed');
  }

  // Business-specific error tracking
  trackBusinessError(
    errorType: 'payment_failure' | 'login_failure' | 'api_timeout' | 'feature_unavailable',
    error: Error,
    businessContext: {
      userId?: string;
      monetaryValue?: number;
      customerTier?: string;
      feature: string;
    }
  ) {
    this.captureError(error, {
      businessContext,
      screenName: 'business_operation',
    });

    // Immediate alerts for high-value errors
    if (businessContext.monetaryValue && businessContext.monetaryValue > 100) {
      console.error('HIGH VALUE ERROR:', {
        type: errorType,
        value: businessContext.monetaryValue,
        customer: businessContext.customerTier,
        user: businessContext.userId,
      });
    }
  }
}

export const errorTracker = new ProductionErrorTracker();

// Error boundary that actually helps
export class TelemetryErrorBoundary extends React.Component<
  {
    children: React.ReactNode;
    fallback?: React.ComponentType<{ error: Error; retry: () => void }>;
    context?: Partial<ErrorContext>;
  },
  { hasError: boolean; error?: Error }
> {
  constructor(props: any) {
    super(props);
    this.state = { hasError: false };
  }

  static getDerivedStateFromError(error: Error) {
    return { hasError: true, error };
  }

  componentDidCatch(error: Error, errorInfo: React.ErrorInfo) {
    errorTracker.captureError(error, {
      ...this.props.context,
      businessContext: {
        feature: 'react_error_boundary',
      },
    });
  }

  render() {
    if (this.state.hasError && this.state.error) {
      if (this.props.fallback) {
        return React.createElement(this.props.fallback, {
          error: this.state.error,
          retry: () => this.setState({ hasError: false, error: undefined })
        });
      }

      return (
        <View style={{ flex: 1, justifyContent: 'center', alignItems: 'center' }}>
          <Text>Something went wrong. Please restart the app.</Text>
        </View>
      );
    }

    return this.props.children;
  }
}

The Firebase Integration That Doesn’t Break

Firebase Performance Monitoring is great for getting started, but it needs careful integration:

// telemetry/firebase-integration.ts - Firebase integration that works
import perf from '@react-native-firebase/perf';
import { SpanExporter, ReadableSpan } from '@opentelemetry/sdk-trace-base';
import { ExportResult, ExportResultCode } from '@opentelemetry/core';

export class ProductionFirebaseExporter implements SpanExporter {
  private activeTraces = new Map<string, any>();
  private maxConcurrentTraces = 50; // Firebase has limits

  export(spans: ReadableSpan[], resultCallback: (result: ExportResult) => void): void {
    try {
      // Process spans in chunks to avoid overwhelming Firebase
      const chunks = this.chunkArray(spans, 10);

      chunks.forEach((chunk, index) => {
        setTimeout(() => {
          chunk.forEach(span => this.processSpan(span));
        }, index * 100); // Stagger processing
      });

      resultCallback({ code: ExportResultCode.SUCCESS });
    } catch (error) {
      console.error('Firebase export error:', error);
      resultCallback({ code: ExportResultCode.FAILED });
    }
  }

  private async processSpan(span: ReadableSpan) {
    const { name, duration, attributes, status } = span;

    // Skip spans that Firebase doesn't handle well
    if (this.shouldSkipSpan(name, attributes)) {
      return;
    }

    // Clean up trace name for Firebase
    const traceName = this.cleanTraceName(name);

    // Manage concurrent traces to avoid Firebase limits
    if (this.activeTraces.size >= this.maxConcurrentTraces) {
      console.warn('Too many active Firebase traces, skipping:', traceName);
      return;
    }

    try {
      const trace = perf().newTrace(traceName);
      this.activeTraces.set(traceName, trace);

      // Add attributes (Firebase has limits on these too)
      this.addSafeAttributes(trace, attributes);

      // Add business metrics
      this.addBusinessMetrics(trace, attributes);

      // Simulate trace timing
      trace.start();

      setTimeout(() => {
        try {
          if (status?.code === 2) { // ERROR
            trace.putAttribute('error', 'true');
            trace.putMetric('error_count', 1);
          }

          trace.stop();
          this.activeTraces.delete(traceName);
        } catch (stopError) {
          console.warn('Firebase trace stop failed:', stopError);
        }
      }, Math.min(duration / 1000000, 60000)); // Max 60s trace

    } catch (error) {
      console.warn('Firebase trace creation failed:', error);
      this.activeTraces.delete(traceName);
    }
  }

  private shouldSkipSpan(name: string, attributes: any): boolean {
    // Skip high-frequency, low-value spans
    if (name.includes('scroll') || name.includes('animation')) {
      return true;
    }

    // Skip internal telemetry spans
    if (name.includes('telemetry') || name.includes('metric')) {
      return true;
    }

    // Skip spans without duration
    if (!attributes['duration'] && !attributes['http.response_time']) {
      return true;
    }

    return false;
  }

  private cleanTraceName(name: string): string {
    // Firebase has strict naming requirements
    return name
      .replace(/[^a-zA-Z0-9_]/g, '_')
      .substring(0, 100) // Firebase limit
      .toLowerCase();
  }

  private addSafeAttributes(trace: any, attributes: any) {
    const safeAttributes: Record<string, string> = {};
    let attributeCount = 0;
    const maxAttributes = 5; // Firebase free tier limit

    // Prioritize business-relevant attributes
    const priorities = [
      'user.id',
      'screen.name',
      'http.status_code',
      'business.feature',
      'error.type',
    ];

    priorities.forEach(key => {
      if (attributes[key] && attributeCount < maxAttributes) {
        safeAttributes[key.replace('.', '_')] = String(attributes[key]).substring(0, 100);
        attributeCount++;
      }
    });

    // Add remaining attributes until limit
    Object.entries(attributes).forEach(([key, value]) => {
      if (!priorities.includes(key) && attributeCount < maxAttributes) {
        const safeKey = key.replace(/[^a-zA-Z0-9_]/g, '_');
        safeAttributes[safeKey] = String(value).substring(0, 100);
        attributeCount++;
      }
    });

    // Set attributes on trace
    Object.entries(safeAttributes).forEach(([key, value]) => {
      try {
        trace.putAttribute(key, value);
      } catch (error) {
        console.warn(`Failed to set Firebase attribute ${key}:`, error);
      }
    });
  }

  private addBusinessMetrics(trace: any, attributes: any) {
    // Add metrics that matter for business monitoring
    try {
      if (attributes['http.status_code']) {
        trace.putMetric('http_status', Number(attributes['http.status_code']));
      }

      if (attributes['api.response_time']) {
        trace.putMetric('response_time_ms', Number(attributes['api.response_time']));
      }

      if (attributes['business.monetary_value']) {
        trace.putMetric('monetary_value', Number(attributes['business.monetary_value']));
      }

      if (attributes['screen.load_duration']) {
        trace.putMetric('load_time_ms', Number(attributes['screen.load_duration']));
      }

    } catch (error) {
      console.warn('Failed to add Firebase metrics:', error);
    }
  }

  private chunkArray<T>(array: T[], chunkSize: number): T[][] {
    const chunks: T[][] = [];
    for (let i = 0; i < array.length; i += chunkSize) {
      chunks.push(array.slice(i, i + chunkSize));
    }
    return chunks;
  }

  async shutdown(): Promise<void> {
    // Clean up any remaining traces
    this.activeTraces.forEach(trace => {
      try {
        trace.stop();
      } catch (error) {
        console.warn('Error stopping Firebase trace during shutdown:', error);
      }
    });
    this.activeTraces.clear();
  }
}

Real Usage Patterns That Actually Help

Here’s how I use the telemetry system in actual app code:

Screen Component Tracking

// In a real screen component
import React, { useEffect, useState } from 'react';
import { performanceMonitor } from '../telemetry/performance-monitor';
import { errorTracker } from '../telemetry/error-tracking';

export function PaymentScreen({ route }: any) {
  const [loading, setLoading] = useState(true);
  const [paymentData, setPaymentData] = useState(null);

  useEffect(() => {
    loadPaymentScreen();
  }, []);

  const loadPaymentScreen = async () => {
    try {
      // Start user journey tracking
      const journeyId = performanceMonitor.startUserJourney('payment_flow', route.params?.userId);

      // Measure screen load with business context
      const data = await performanceMonitor.measureScreenLoad(
        'payment_screen',
        async () => {
          // Load payment methods
          const methods = await performanceMonitor.instrumentApiCall(
            '/api/payment-methods',
            'GET',
            () => api.getPaymentMethods(),
            {
              userId: route.params?.userId,
              feature: 'payment_methods',
              monetaryValue: route.params?.totalAmount,
            }
          );

          // Load user preferences
          const preferences = await api.getUserPreferences();

          return { methods, preferences };
        },
        true // This is business critical
      );

      setPaymentData(data);
      setLoading(false);

    } catch (error) {
      errorTracker.trackBusinessError('payment_failure', error as Error, {
        userId: route.params?.userId,
        monetaryValue: route.params?.totalAmount,
        customerTier: route.params?.customerTier,
        feature: 'payment_screen_load',
      });

      setLoading(false);
    }
  };

  const handlePaymentSubmit = async (paymentDetails: any) => {
    try {
      const result = await performanceMonitor.instrumentApiCall(
        '/api/process-payment',
        'POST',
        () => api.processPayment(paymentDetails),
        {
          userId: route.params?.userId,
          feature: 'payment_processing',
          monetaryValue: route.params?.totalAmount,
        }
      );

      // Complete journey successfully
      performanceMonitor.completeUserJourney(journeyId, true, {
        paymentMethod: paymentDetails.method,
        amount: route.params?.totalAmount,
      });

      // Navigate to success
      navigation.navigate('PaymentSuccess', { transactionId: result.id });

    } catch (error) {
      // Complete journey with failure
      performanceMonitor.completeUserJourney(journeyId, false, {
        failedStep: 'payment_processing',
        error: error.message,
      });

      errorTracker.trackBusinessError('payment_failure', error as Error, {
        userId: route.params?.userId,
        monetaryValue: route.params?.totalAmount,
        customerTier: route.params?.customerTier,
        feature: 'payment_processing',
      });
    }
  };

  if (loading) {
    return <LoadingSpinner />;
  }

  return (
    <PaymentForm
      data={paymentData}
      onSubmit={handlePaymentSubmit}
    />
  );
}

The Monitoring Setup That Prevented Outages

After implementing this system, here’s what we monitor in production:

Datadog Dashboard Configuration

// The dashboard that saved us from multiple incidents
export const productionDashboards = {
  "mobile_app_health": {
    "title": "Mobile App Health - Production",
    "widgets": [
      {
        "title": "Critical Business Errors",
        "type": "timeseries",
        "queries": [
          {
            "query": "sum:custom.critical_errors{*} by {error_type}",
            "display_type": "bars"
          }
        ],
        "alert_threshold": 5 // Alert if more than 5 critical errors in 5 min
      },
      {
        "title": "Payment API Response Times",
        "type": "timeseries",
        "queries": [
          {
            "query": "avg:custom.api_call_duration{endpoint:payment*} by {endpoint}",
            "display_type": "line"
          }
        ],
        "alert_threshold": 5000 // Alert if payment APIs exceed 5s
      },
      {
        "title": "Screen Load Performance",
        "type": "heatmap",
        "queries": [
          {
            "query": "custom.screen_load_duration{business_critical:true}"
          }
        ]
      },
      {
        "title": "User Journey Completion Rate",
        "type": "query_value",
        "queries": [
          {
            "query": "sum:custom.user_journey_completion{success:true} / sum:custom.user_journey_completion{*} * 100"
          }
        ]
      },
      {
        "title": "App Crashes by Device",
        "type": "toplist",
        "queries": [
          {
            "query": "sum:custom.critical_errors{type:crash} by {device_model}"
          }
        ]
      }
    ]
  }
};

Alerts That Actually Work

// Alerts that wake me up for real issues, not noise
export const productionAlerts = {
  "payment_failure_spike": {
    "name": "Payment API Failure Spike",
    "query": "sum(last_5m):sum:custom.critical_errors{type:payment_api_failure} > 3",
    "message": "@slack-payments @pagerduty-critical",
    "priority": "P1",
    "escalation": "immediate"
  },

  "user_journey_drop": {
    "name": "User Journey Completion Drop",
    "query": "avg(last_15m):sum:custom.user_journey_completion{success:true} / sum:custom.user_journey_completion{*} < 0.8",
    "message": "@slack-product @email-team",
    "priority": "P2",
    "escalation": "15_minutes"
  },

  "critical_screen_slow": {
    "name": "Critical Screen Load Time",
    "query": "avg(last_10m):avg:custom.screen_load_duration{business_critical:true} > 5000",
    "message": "@slack-engineering",
    "priority": "P2",
    "escalation": "30_minutes"
  }
};

Performance Impact and Optimization

Production observability systems typically show these performance characteristics:

Resource Usage

  • CPU overhead: 2-3% average in production environments
  • Memory overhead: 15-20MB (mostly trace buffering)
  • Battery impact: Negligible (less than 1% daily drain)
  • Network usage: 50-100KB per day per user

Cost Analysis (Monthly)

  • Datadog: $300-500/month (based on volume)
  • Firebase: $0-50/month (depending on usage)
  • AWS infrastructure: $30-80/month (OTEL collector)
  • Development efficiency: Significant reduction in debugging time
  • ROI: Typically positive within first month of implementation

Optimization Strategies That Worked

// Smart sampling that reduced costs by 60%
class AdaptiveSampler {
  private errorRate = new Map<string, number>();
  private criticalSessions = new Set<string>();

  shouldSample(spanName: string, attributes: any): boolean {
    // Always sample errors and critical business flows
    if (spanName.includes('error') || attributes['business.monetary_value']) {
      return true;
    }

    // Sample critical user sessions at higher rate
    if (attributes['user.tier'] === 'premium') {
      return Math.random() < 0.5; // 50% sampling
    }

    // Adaptive sampling based on error rates
    const errorRate = this.errorRate.get(spanName) || 0;
    if (errorRate > 0.05) { // More than 5 percent errors
      return Math.random() < 0.8; // Increase sampling
    }

    // Default sampling
    return Math.random() < 0.1; // 10% base rate
  }
}

Results: Production Observability Benefits

Issues Detected Early

  1. Platform-Specific Network Issues: API timeouts detected for specific OS/network combinations
  2. Memory Leaks: RAM usage increases detected through memory monitoring
  3. Race Conditions: Payment flow issues identified through journey tracking
  4. Battery Optimization: Background processes causing excessive battery drain

Business Impact

  • Faster Issue Resolution: Reduced average debugging time through better visibility
  • Proactive Monitoring: Issues detected before widespread user impact
  • Improved User Experience: Performance optimizations based on real usage data
  • Revenue Protection: Early detection of payment and business-critical failures

Development Benefits

  • Enhanced Debugging: Rich context in error reports with user journey data
  • Deployment Confidence: Comprehensive monitoring for regression detection
  • Data-Driven Optimization: Performance improvements based on production metrics

Implementation Lessons

1. Start Simple, Evolve Gradually

Implement observability incrementally. Begin with:

  1. Critical business flows (payments, login, core features)
  2. Error tracking with context
  3. Performance monitoring for key screens
  4. Basic user journey tracking

2. Context Is Everything

Raw metrics are useless. Always include:

  • User context (ID, session, journey)
  • Business context (feature, monetary value, customer tier)
  • Technical context (device, network, app version)
  • Error context (what the user was doing)

3. Sampling Strategy Matters

  • Critical flows: 100% sampling
  • Business features: 50% sampling
  • UI interactions: 10% sampling
  • Background tasks: 1% sampling

4. Alert Strategy

Focus alerts on actionable issues:

  • Payment processing failures
  • Crash rate spikes
  • Critical business flow completion drops
  • Security-related events

5. Multiple Exporters = Reliability

Don’t rely on a single monitoring provider:

  • Primary: Datadog (rich analytics)
  • Secondary: Elastic APM (cost control)
  • Backup: Firebase (always works)

Implementation Roadmap: 7-Day Plan

Day 1-2: Foundation

  • Set up OpenTelemetry provider
  • Add basic error tracking
  • Implement global error handlers

Day 3-4: Performance Monitoring

  • Add screen load tracking
  • Implement API call instrumentation
  • Set up navigation tracking

Day 5-6: Business Metrics

  • Track user journeys
  • Add custom business events
  • Set up critical flow monitoring

Day 7: Production Deployment

  • Configure sampling rates
  • Set up alerts
  • Create monitoring dashboards

Conclusion: Observability as a Competitive Advantage

Comprehensive observability transforms development practices from reactive debugging to proactive optimization. The ability to quickly identify issues, prevent outages, and optimize based on real data significantly improves both development velocity and user experience.

The initial investment in observability infrastructure pays dividends through:

  • Reduced debugging time and faster issue resolution
  • Proactive issue detection before user impact
  • Data-driven performance optimizations
  • Increased confidence in production deployments

Implementing proper observability is essential for any production React Native application. Start with basic monitoring and evolve the system based on your specific needs and learnings.

Related posts

Sentry Integration with React Native Expo: A Practical Quick Guide

Step-by-step guide to integrating Sentry error monitoring into a React Native Expo app. Covers SDK initialization, Expo Router instrumentation, session replay, source map uploads for EAS Build and EAS Update, and common pitfalls to avoid.

react-nativeexpomonitoring+2
OpenTelemetry Fundamentals: A Beginner's Guide to Modern Observability

A comprehensive beginner's guide to OpenTelemetry covering traces, metrics, and logs with practical implementation examples, common pitfalls, and a detailed terminology glossary.

opentelemetryobservabilitydistributed-tracing+5
Observability Beyond Metrics: The Art of System Storytelling

Moving past dashboards full of green lights to build observability systems that tell compelling narratives about system behavior, user journeys, and business impact through distributed tracing and AI-powered analysis

observabilitymonitoringdistributed-tracing+5
DynamoDB Rate Limiting: Strategies for Single Table Design at Scale

Practical strategies to prevent and handle DynamoDB throttling in Single Table Design applications. Covers partition key design, write sharding, capacity modes, DAX caching, retry patterns, and CloudWatch monitoring for high-throughput systems.

dynamodbawsrate-limiting+5
LangChain in Production: Patterns That Work and Anti-Patterns That Don't

Real lessons from deploying LangChain applications to production. Learn about the anti-patterns that cause failures and the patterns that enable success, with working code examples and cost optimization strategies.

langchainllmproduction+5