1. Requirements Clarification
Functional Requirements
- 1:1 messaging between users
- Group messaging (up to 500 members per group)
- Online/offline presence indicator
- Message delivery receipts: sent, delivered, read
- Message history (persistent storage, accessible after reconnect)
- Media sharing: images, videos, documents
- Push notifications for offline users
Non-Functional Requirements
- Scale: 50 million daily active users, 100 messages per user per day = 5 billion messages/day
- Latency: <100ms message delivery for online users
- Concurrent connections: Up to 10 million simultaneous WebSocket connections
- Durability: Messages must not be lost; at-least-once delivery
2. Real-Time Protocol: WebSockets vs Alternatives
| Protocol | How It Works | Latency | Server Load | Use Case |
|---|---|---|---|---|
| Short Polling | Client polls every N seconds | Up to N seconds | Very high (constant requests) | Not suitable for chat |
| Long Polling | Client holds request open; server replies when message arrives | ~100–500ms | High (one connection per request) | Fallback when WebSocket unavailable |
| Server-Sent Events (SSE) | Server pushes events over HTTP; client-to-server still needs HTTP | ~50ms | Medium | One-way push (e.g. feeds), not ideal for chat |
| WebSocket | Persistent bi-directional TCP connection | ~30–50ms | Low per connection (event-driven) | Best choice for chat |
WebSocket is the clear winner. A single WebSocket connection handles all real-time traffic for a user — messages in, messages out, presence events, typing indicators — on a single persistent TCP connection.
3. High-Level Architecture
Client A (Alice) Client B (Bob)
│ │
│ WebSocket │ WebSocket
▼ ▼
┌──────────────┐ ┌──────────────┐
│ WS Server 1 │ │ WS Server 2 │
│ (Alice conn) │ │ (Bob conn) │
└──────┬───────┘ └──────┬───────┘
│ │
└────────────┬─────────────────┘
│
┌─────────▼─────────┐
│ Redis Pub/Sub │ ← fan-out messages between WS servers
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Message Service │ ← persists messages, handles receipts
└─────────┬─────────┘
│
┌─────────▼─────────┐
│ Cassandra Cluster│ ← message storage (write-heavy, time-range)
└───────────────────┘
│
┌─────────▼─────────┐
│ S3 + CDN │ ← media files
└───────────────────┘
4. Message Flow — 1:1 Chat
Understanding the exact sequence of operations for a single message reveals all the design decisions needed.
- Alice types a message and hits send. Her client sends a WebSocket frame to WS Server 1:
{"type":"msg","to":"bob","content":"Hi!","client_msg_id":"uuid-123"} - WS Server 1 forwards the message to the Message Service (internal HTTP or gRPC call)
- Message Service persists the message to Cassandra and returns a server-assigned
msg_id - Message Service ACKs back to Alice:
{"type":"ack","client_msg_id":"uuid-123","msg_id":"server-456","status":"SENT"}. Alice's UI shows a single grey tick. - Message Service checks Redis: is Bob currently connected? If yes, publish to Redis channel
user:bob - WS Server 2 is subscribed to
user:bob. It receives the message and pushes to Bob's WebSocket. - Bob's device receives the message and sends a DELIVERED ACK. Alice's UI shows double grey ticks.
- Bob opens the conversation and sends a READ event. Alice's UI shows double blue ticks.
- If Bob is offline at step 5, Message Service calls the push notification service instead.
Interview Tip — client_msg_id for Idempotency
The client generates a UUID before sending. If the network drops after the message is persisted but before the ACK reaches Alice, she might resend. The server checks if client_msg_id already exists and returns the existing msg_id — no duplicate stored.
5. Group Chat Fan-Out
Group chat is more complex because one message must reach potentially hundreds of members. There are two strategies:
Fan-Out on Write (Active Delivery)
When a message is sent, immediately deliver it to every online member and create inbox entries for offline members. Used by WhatsApp. Good for small groups (<500 members).
Fan-Out on Read (Passive Storage)
Store the message once. When a member opens the group, fetch messages since last_seen. Used for very large groups (1,000+ members). Less real-time, but more storage-efficient.
Group message from Alice to Group(id=G1, members: Bob, Carol, Dave)
│
▼
Message Service: persist message (group_id=G1, msg_id=M1)
│
▼
Fan-out Worker: load members of G1 = [Bob, Carol, Dave]
│
├── Is Bob online? YES → publish to Redis user:bob
├── Is Carol online? NO → enqueue push notification
└── Is Dave online? YES → publish to Redis user:dave
Redis pub/sub routes to WS servers holding Bob's and Dave's connections.
Carol gets a push notification. When Carol reconnects, she fetches
messages since her last_seen timestamp from Cassandra.
6. Database Schema
Chat systems are write-heavy and require fast time-range queries ("give me messages in channel X between time T1 and T2"). Cassandra's partition model maps perfectly to this access pattern.
7. Presence System
The presence system tracks which users are online. It must be fast (read on every incoming message to decide push vs WebSocket delivery) and eventually consistent (a few seconds of staleness is acceptable).
- On WebSocket connect:
SETEX presence:user:{id} 30 "ONLINE"in Redis - Client sends heartbeat every 10 seconds: refresh TTL to 30 seconds
- On WebSocket disconnect:
SET presence:user:{id} "OFFLINE"+ storelast_seentimestamp - Query:
GET presence:user:{id}— O(1), <1ms - For friend presence lists: use Redis pipeline to batch-GET presence for all friends in one round trip
8. Message Delivery Receipts
Three-tier receipts (sent / delivered / read) require tracking per-message, per-user status.
Watch Out — Receipt Storage at Scale
5 billion messages/day × 2 receipt events each = 10 billion receipt writes/day. Do not store receipts in your primary MySQL — use Cassandra with a (msg_id, user_id) partition key, or Redis sorted sets for recent messages and Cassandra for historical.
9. Media Storage
Media files are stored in S3, not in the database. The flow uses pre-signed URLs to avoid proxying large files through your servers.
- Client requests a pre-signed upload URL:
POST /api/media/upload-url?type=image&size=2048000 - Server generates a pre-signed S3 PUT URL (expires in 5 minutes) and returns it
- Client uploads directly to S3 using the pre-signed URL
- Client sends a message with
type=IMAGEand the S3 object key - Recipient requests a pre-signed download URL from your server to view the image
- Apply S3 lifecycle rules: move to Glacier after 90 days, delete after 1 year
10. Scaling WebSocket Connections
The challenge with WebSockets is that connections are stateful — you need to route messages to the right server holding a user's connection. Redis Pub/Sub solves the cross-server routing problem.
| Concern | Problem | Solution |
|---|---|---|
| Connection routing | Message for user B must reach the server holding B's connection | Redis Pub/Sub: publish to channel user:{B}, server holding B subscribes |
| Load balancer stickiness | Reconnects must reach the same server for session state | IP hash or cookie-based sticky sessions at L7 LB |
| Server capacity | 10M concurrent users, ~50K conns/server | 200 WebSocket servers; auto-scale based on connection count |
| Server crash recovery | All users on a crashed server lose connection | Clients auto-reconnect with exponential backoff; fetch missed messages since last_seen |
| Redis Pub/Sub scale | Redis single-node bottleneck for Pub/Sub | Redis Cluster + Consistent hashing channels to shards |
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — Chat System Design
HTTP polling requires the client to repeatedly ask "any new messages?" every few seconds, wasting bandwidth and adding latency. WebSocket establishes a persistent bi-directional TCP connection. The server pushes messages instantly when they arrive — zero polling overhead, ~50ms delivery latency. Long polling (HTTP request held open until a message arrives) is a middle ground that works where WebSockets are blocked by proxies, but adds complexity and latency.
Messages are stored in a messages table with sender_id, receiver_id (or channel_id), content, and timestamp. For 1:1 chats, write one row per message. For group chats, write one message row but fan-out delivery to each member by inserting rows into a user_messages table (one per recipient) or by querying members at delivery time. HBase or Cassandra are preferred over MySQL for message storage at scale due to their high write throughput and time-range query performance.
Use a message_status table with (message_id, user_id, status, updated_at). When the recipient's device receives the message, it sends an ACK to the server which updates status to DELIVERED. When the user opens the conversation, the client sends a READ event which updates to READ. The sender's client subscribes to status updates via WebSocket. WhatsApp uses this exact pattern — single grey tick = sent, double grey = delivered, double blue = read.
When a user connects via WebSocket, write (user_id, last_seen = now, status = ONLINE) to Redis with a TTL of 30 seconds. The client sends a heartbeat every 10 seconds to refresh the TTL. If the TTL expires (user went offline without a proper disconnect), Redis auto-removes the key. On disconnect, update status to OFFLINE and set last_seen. Query presence via Redis for real-time status. For scale, fan-out presence changes to friends via Pub/Sub.
Never embed media in the messages table. Instead, the client uploads the file directly to S3 using a pre-signed URL. S3 returns the object key. The client sends a message with type=IMAGE and the S3 object key. The receiver downloads from S3 directly via a pre-signed download URL generated by your server. This keeps media out of your message pipeline and leverages S3's CDN for fast global delivery. Apply lifecycle policies to delete media older than 30–90 days.
Each WebSocket server can handle ~10,000–50,000 concurrent connections. For 1M concurrent users, deploy 20–100 WebSocket servers behind a load balancer with sticky sessions (same user always routes to same server). Use Redis Pub/Sub to fan-out messages across servers: when server A receives a message for user B (connected to server C), it publishes to Redis channel user:B, and server C (subscribed to that channel) pushes to user B's WebSocket.