Designing for 100 Million Users: A 2026 System Design Masterclass
Designing for 100 Million Users: A 2025 System Design Masterclass
Scaling a system from 10,000 users to 100 million is not just about adding more servers. It requires a fundamental shift in architecture, moving from monolithic simplicity to distributed complexity. At this scale, you are fighting against the laws of physics (latency) and the inevitability of hardware failure.
This guide serves as a blueprint for architecting high-scale, resilient systems in 2025.
1. The 10,000 ft View: Architecture Evolution
Stage 1: The Monolith (0 - 10k Users)
At this stage, simplicity is key.
- App: Single monolith (Next.js/Django/Rails).
- DB: Single Relational Database (Postgres).
- Deployment: Single VPS (EC2/DigitalOcean).
- Focus: Feature velocity.
Stage 2: Separation of Concerns (10k - 1M Users)
The database starts choking.
- Database: Add Read Replicas. Master handles writes; Replicas handle reads.
- Cache: Introduce Redis for session storage and frequently accessed data.
- CDN: Offload static assets (images, CSS, JS) to Cloudflare or AWS CloudFront.
Stage 3: Service-Oriented Architecture (1M - 10M Users)
The teams grow too large for a single codebase.
- Microservices: Split the monolith into core domains (Auth, Payments, Notifications).
- Message Queues: Decouple services using Kafka or RabbitMQ.
- Load Balancers: NGINX/HAProxy to distribute traffic across stateless app servers.
Stage 4: Planetary Scale (100M+ Users)
- Sharding: Horizontal partitioning of the database.
- Geo-Distribution: Active-Active multi-region deployment.
- Observability: Distributed tracing and automated healing.
2. Global Traffic Management & Load Balancing
When you have users in Mumbai, New York, and London, a single data center in "us-east-1" won't cut it. You need Geo-DNS routing.
Algorithms
- Round Robin: Simple, but doesn't account for server load.
- Least Connections: Routes to the server with fewest active connections.
- Geo-Proximity: Routes user to the physically closest data center (latency-based).
Pro Tip: Use Layer 7 Load Balancing (Application Layer) to route traffic based on content type. e.g., `/api/video` goes to high-bandwidth servers, `/api/chat` goes to real-time optimized clusters.
3. Database Scaling: The Hardest Part
Stateless application servers are easy to scale (just add more). Stateful databases are the bottleneck.
Sharding (Horizontal Partitioning)
Splitting your data across multiple physical nodes.
- Shard Key: The most critical decision. If you shard by `user_id`, all data for a user lives on one node.
- Consistent Hashing: A technique to distribute data across shards such that adding/removing a node only affects `k/n` keys (where n is the number of nodes).
Problems with Sharding:
- Celebrity Problem: What if Justin Bieber joins your platform? His shard will handle 1000x more traffic (Hot Partition).
- Solution: Additional caching for hot keys.
- Joins: You cannot perform JOINs across shards.
- Solution: Denormalize data. Duplicate the 'User Name' in the 'Comments' table so you don't need to join.
4. Caching: The Layered Defense
Caching is the only way to protect your database from death by traffic spikes.
The Caching Strategy (Cache-Aside)
- App asks Cache for data.
- If Hit, return data.
- If Miss, query DB, write to Cache, return data.
4 Layers of Caching
- Client-Side: Browser caching/Local Storage. Use `Stale-While-Revalidate`.
- CDN: Edge caching for HTML documents and API responses.
- App Cache (Redis): Key-value store for user sessions and timelines.
- Database Buffer Pool: Postgres uses RAM to cache frequently accessed disk pages.
5. Asynchronous Processing & Event-Driven Architecture
In a synchronous system, if the Email Service is down, the User Registration fails. In an asynchronous system, the registration succeeds, and the email is queued.
Message Brokers (Kafka vs RabbitMQ)
- Kafka: High throughput log streams. Best for analytics and event sourcing. Messages persist for days.
- RabbitMQ: Traditional queue. Best for complex routing logic and immediate task processing.
The Outbox Pattern
How do you update the database AND send a message to Kafka atomically?
- Start Transaction.
- Insert User into `users` table.
- Insert Event into `outbox` table in the same DB.
- Commit Transaction.
- A separate worker polls the `outbox` table and pushes to Kafka.
6. Reliability & Failure Modes
At scale, failure is guaranteed.
Circuit Breaker Pattern
If the 'Payment Service' is failing 50% of the time, stop calling it.
- Closed: Normal operation.
- Open: Error threshold reached. Fast fail immediately without calling the service.
- Half-Open: Let a few requests through to test if the service has recovered.
Rate Limiting
Prevent abuse and cascading failures.
- Token Bucket Algorithm: Allow bursts of traffic but enforce a long-term average.
- Leaky Bucket: Smooth out traffic processing at a constant rate.
Conclusion: Trade-offs
System design is the art of trade-offs.
- Consistency vs Availability (CAP Theorem): You can't have both during a network partition. For a Bank, choose Consistency (CP). For a Social Network, choose Availability (AP).
- Latency vs Throughput: Batching requests improves throughput but increases latency.
Designing for 100 million users forces you to think in terms of failure domains, distributed consensus, and eventual consistency. It is the ultimate engineering playground.
Share this article
About James Wilson
Distinguished Engineer at Netflix. 20 years of experience building internet-scale systems.