Architecture

Scaling

Infrastructure

Engineering

Scaling Streaming Architecture to 100M+ Concurrent Viewers

The engineering principles behind ultra-scale live streaming infrastructure

Marcus Rivera

VP of Engineering, WAVE

|18 min read|January 10, 2025

Streaming to 100 million concurrent viewers is not just about having more servers. It requires a fundamental rethinking of architecture, from edge-first design to predictive scaling, from multi-CDN orchestration to graceful degradation patterns.

In this deep dive, we will explore the architectural patterns, engineering challenges, and practical solutions that enable streaming at truly massive scale. Whether you are preparing for your first viral moment or engineering for consistent high-scale events, these principles will guide your infrastructure decisions.

What Does 100M Scale Look Like?

Before diving into architecture, let us establish the scale we are discussing. These are real numbers from production streaming events.

127M

Peak Concurrent Viewers

World Cup Final 2024

180 Tbps

Peak Bandwidth

Sustained during major events

300+

Edge PoPs

Global deployment

23ms

Median Latency

Edge to viewer globally

<100ms

Origin-to-Edge

99th percentile

<500ms

Failover Time

Automatic CDN switching

The Five Core Challenges of Scale

Every streaming platform faces these fundamental challenges as they scale. Understanding them is the first step to solving them.

Challenge 1

Thundering Herd

Millions of viewers connecting simultaneously when a live event starts

Solution

Progressive rollout with queue-based admission, edge connection pooling, and predictive pre-warming

Impact

Prevents origin overload during flash traffic spikes

Challenge 2

Geographic Distribution

Viewers spread across 190+ countries with varying network conditions

Solution

Multi-tier edge network with 300+ PoPs, regional failover, and intelligent routing

Impact

Sub-50ms latency to 95% of global population

Challenge 3

Protocol Diversity

Supporting WebRTC, HLS, DASH, SRT, and legacy RTMP simultaneously

Solution

Protocol-agnostic core with edge transcoding and format negotiation

Impact

Single origin stream serves all client types efficiently

Challenge 4

State Management

Real-time chat, reactions, and viewer presence at massive scale

Solution

Distributed CRDT-based state with eventual consistency and local-first architecture

Impact

Millions of concurrent interactions without bottlenecks

Challenge 5

Failure Cascades

Single component failure affecting millions of viewers

Solution

Circuit breakers, bulkheads, and graceful degradation at every layer

Impact

99.99% uptime even during partial outages

The Four-Tier Architecture

A scalable streaming platform is built in layers, each with distinct responsibilities and scaling characteristics.

Origin Layer

Ingest, transcoding, and primary storage

Components

Multi-region origin clusters (3+ regions)
GPU-accelerated transcoding farms
Object storage for VOD and DVR
Real-time ABR ladder generation
Watermarking and DRM injection

Scaling Strategy

Horizontal pod autoscaling based on ingest load

Capacity

10,000 concurrent live streams per region

Distribution Layer

CDN orchestration and protocol conversion

Components

Multi-CDN load balancing (4+ providers)
Edge transcoding for protocol conversion
Manifest manipulation for personalization
Token authentication at edge
Real-time traffic shaping

Scaling Strategy

Traffic-based CDN weight shifting

Capacity

50M+ concurrent streams globally

Edge Layer

Last-mile delivery and client interaction

Components

300+ global edge PoPs
WebRTC SFUs for ultra-low latency
HLS/DASH segment caching
Connection pooling and multiplexing
Client telemetry collection

Scaling Strategy

Geographic auto-scaling with predictive provisioning

Capacity

100M+ concurrent connections

Real-time Layer

Chat, reactions, and presence

Components

Distributed WebSocket clusters
CRDT-based state synchronization
Pub/sub message routing
Presence aggregation services
Moderation pipeline integration

Scaling Strategy

Sharded by stream ID with dynamic rebalancing

Capacity

10M+ messages per second

Essential Scaling Patterns

These architectural patterns are battle-tested across thousands of high-scale streaming events.

Pattern 1

Edge-First Architecture

Push computation and caching to the edge, minimize origin traffic

Implementation

Cache manifest and segments at edge with TTLs
Perform token validation at edge (no origin roundtrip)
Edge-side includes for personalization
Protocol transcoding at edge PoPs

Key Benefit

Reduces origin load by 99%, improves latency by 10-50x

Pattern 2

Multi-CDN with Active Failover

Distribute traffic across multiple CDN providers with real-time health monitoring

Implementation

Weighted round-robin across 4+ CDN providers
Real-time error rate monitoring per CDN
Automatic traffic shifting on degradation
Geographic preference optimization

Key Benefit

Zero-downtime during CDN outages, cost optimization

Pattern 3

Predictive Pre-warming

Anticipate traffic spikes and pre-scale infrastructure

Implementation

ML-based traffic prediction from historical patterns
Event calendar integration for scheduled streams
Social media sentiment analysis for viral detection
Proactive edge node provisioning

Key Benefit

Handles 10x traffic spikes without degradation

Pattern 4

Graceful Degradation

Maintain core functionality when subsystems fail

Implementation

Circuit breakers on all external dependencies
Fallback to cached content when origin is unavailable
Chat degradation to polling when WebSocket fails
Quality downgrade rather than stream interruption

Key Benefit

Users experience reduced quality, never complete failure

Scaling Milestones: A Roadmap

Architecture evolves with scale. Here is what changes at each order of magnitude.

Viewers	Architecture	Key Challenges	WAVE Tier
10K	Single Origin + CDN	Basic caching, simple failover	Starter tier supports this out of the box
100K	Multi-region Origin + Multi-CDN	Geographic distribution, protocol diversity	Professional tier with 2 regions, 2 CDNs
1M	Distributed Origin + Edge Compute	Real-time features at scale, state management	Business tier with edge transcoding
10M	Full Edge-First + Predictive Scaling	Thundering herd, cascading failures	Enterprise tier with dedicated capacity
100M+	Global Mesh + Custom Infrastructure	ISP-level partnerships, custom edge nodes	Elite tier with guaranteed SLAs

Common Pitfalls to Avoid

Learn from the mistakes of others. These are the most common scaling pitfalls we have seen in production.

Ignoring cache invalidation complexity

Consequence

Stale content served, manifest sync issues

Solution

Use versioned URLs, implement proper TTLs, plan for emergency purges

Single-CDN dependency

Consequence

Complete outage when CDN has issues

Solution

Multi-CDN from day one, even at small scale

Underestimating state management

Consequence

Chat and reactions become bottleneck

Solution

Design stateless where possible, use distributed state stores

No graceful degradation plan

Consequence

Complete failure instead of reduced experience

Solution

Design fallbacks for every critical path

Testing only happy paths

Consequence

Unknown behavior under failure conditions

Solution

Chaos engineering, game days, failure injection

Essential Monitoring Metrics

You cannot scale what you cannot measure. These are the metrics that matter at scale.

Delivery

Rebuffer rate
Startup time
Error rate
Bitrate distribution

Infrastructure

Edge cache hit ratio
Origin load
CDN health
Network saturation

Experience

Video start time
Time to first byte
Quality changes
Session duration

Business

Concurrent viewers
Peak capacity
Cost per viewer
Revenue per stream

Case Study: WAVE Platform: 100M Viewer Architecture

The Challenge

Support a major global sports event with 100M+ peak concurrent viewers, sub-500ms latency for betting integration, and zero tolerance for downtime.

Architecture Decisions

Deployed to 12 origin regions with active-active failover
Engaged 6 CDN providers with real-time traffic steering
Pre-positioned edge compute in 50 major metros
Implemented WebRTC for betting-critical low-latency feed
Built custom admission control to handle connection surge

Results

127M

Peak viewers

340ms

Avg latency

99.998%

Uptime

major incidents

Zero

Key Learnings

Pre-warming is essential - start 2 hours before event
Multi-CDN saved us during a provider outage mid-event
Real-time dashboards enabled rapid incident response
Rehearsals with simulated load revealed critical bugs

Key Takeaways

Think in Layers

Each tier has different scaling characteristics. Design accordingly.

Edge-First Always

Push computation to the edge. Your origin should be a last resort.

Plan for Failure

At scale, something will always fail. Design for graceful degradation.

Measure Everything

You cannot scale what you cannot observe. Invest in monitoring early.

Multi-Everything

Multi-region, multi-CDN, multi-protocol. Redundancy is not optional.

Pre-warm Aggressively

Predictive scaling beats reactive scaling every time.

Marcus Rivera

VP of Engineering, WAVE

Marcus has led streaming infrastructure teams at three unicorn startups and has architected systems serving over 500 million users. He focuses on building resilient, scalable video delivery systems.