Designing for 100 Million Users: A 2025 System Design Guide
Designing for 100 Million Users: A 2025 System Design Guide
Scaling from 1,000 to 100,000 users is hard. Scaling from 10 million to 100 million is a different beast entirely. At this scale, physics becomes your enemy.
1. Global Distribution & Edge Computing
You cannot serve 100M users from a single region (us-east-1). You need to be where your users are.
The Edge Strategy
- Static Assets: CDN (Cloudflare/Akamai) is non-negotiable.
- Compute: Edge Functions (Vercel/Cloudflare Workers) for personalization and auth.
- Database: Global Read Replicas (Aurora Global Database).
2. Database Scaling Strategies
A single Postgres instance will melt.
Vertical Scaling (The Easy Way)
Buy a bigger server. Works until it doesn't.
Horizontal Scaling (Sharding)
Splitting your data across multiple nodes based on a Shard Key (e.g., user_id).
-- Shard 1 (Users 0-1M)
SELECT * FROM users WHERE id = 12345;
-- Shard 2 (Users 1M-2M)
SELECT * FROM users WHERE id = 1002345;
Challenges:
- Resharding is painful.
- Cross-shard joins are impossible (or very slow).
- Consistent Hashing is required.
3. Caching at Every Layer
The best request is the one that never hits your database.
- Browser Cache:
Cache-Control: max-age=31536000 - CDN Cache: Edge caching for HTML/JSON.
- Application Cache: Redis/Memcached for hot data.
- Database Buffer Pool: In-memory database pages.
4. Asynchronous Architecture
Stop doing everything in the request/response cycle.
Synchronous: User -> API -> Email Service -> API -> User (Slow, fragile) Asynchronous: User -> API -> Kafka -> Email Worker (Fast, resilient)
Conclusion
There is no silver bullet. High-scale architecture is about making the right trade-offs between Consistency, Availability, and Partition Tolerance (CAP Theorem).
Share this article
About James Wilson
Distinguished Engineer at Netflix. 20 years of experience building internet-scale systems.