1. Requirements Clarification

Functional Requirements

  • 1:1 messaging between users
  • Group messaging (up to 500 members per group)
  • Online/offline presence indicator
  • Message delivery receipts: sent, delivered, read
  • Message history (persistent storage, accessible after reconnect)
  • Media sharing: images, videos, documents
  • Push notifications for offline users

Non-Functional Requirements

  • Scale: 50 million daily active users, 100 messages per user per day = 5 billion messages/day
  • Latency: <100ms message delivery for online users
  • Concurrent connections: Up to 10 million simultaneous WebSocket connections
  • Durability: Messages must not be lost; at-least-once delivery

2. Real-Time Protocol: WebSockets vs Alternatives

ProtocolHow It WorksLatencyServer LoadUse Case
Short PollingClient polls every N secondsUp to N secondsVery high (constant requests)Not suitable for chat
Long PollingClient holds request open; server replies when message arrives~100–500msHigh (one connection per request)Fallback when WebSocket unavailable
Server-Sent Events (SSE)Server pushes events over HTTP; client-to-server still needs HTTP~50msMediumOne-way push (e.g. feeds), not ideal for chat
WebSocketPersistent bi-directional TCP connection~30–50msLow per connection (event-driven)Best choice for chat

WebSocket is the clear winner. A single WebSocket connection handles all real-time traffic for a user — messages in, messages out, presence events, typing indicators — on a single persistent TCP connection.

3. High-Level Architecture

  Client A (Alice)               Client B (Bob)
       │                               │
       │ WebSocket                     │ WebSocket
       ▼                               ▼
┌──────────────┐               ┌──────────────┐
│  WS Server 1 │               │  WS Server 2 │
│ (Alice conn) │               │ (Bob conn)   │
└──────┬───────┘               └──────┬───────┘
       │                              │
       └────────────┬─────────────────┘
                    │
          ┌─────────▼─────────┐
          │   Redis Pub/Sub   │  ← fan-out messages between WS servers
          └─────────┬─────────┘
                    │
          ┌─────────▼─────────┐
          │  Message Service  │  ← persists messages, handles receipts
          └─────────┬─────────┘
                    │
          ┌─────────▼─────────┐
          │  Cassandra Cluster│  ← message storage (write-heavy, time-range)
          └───────────────────┘
                    │
          ┌─────────▼─────────┐
          │     S3 + CDN      │  ← media files
          └───────────────────┘

4. Message Flow — 1:1 Chat

Understanding the exact sequence of operations for a single message reveals all the design decisions needed.

  1. Alice types a message and hits send. Her client sends a WebSocket frame to WS Server 1: {"type":"msg","to":"bob","content":"Hi!","client_msg_id":"uuid-123"}
  2. WS Server 1 forwards the message to the Message Service (internal HTTP or gRPC call)
  3. Message Service persists the message to Cassandra and returns a server-assigned msg_id
  4. Message Service ACKs back to Alice: {"type":"ack","client_msg_id":"uuid-123","msg_id":"server-456","status":"SENT"}. Alice's UI shows a single grey tick.
  5. Message Service checks Redis: is Bob currently connected? If yes, publish to Redis channel user:bob
  6. WS Server 2 is subscribed to user:bob. It receives the message and pushes to Bob's WebSocket.
  7. Bob's device receives the message and sends a DELIVERED ACK. Alice's UI shows double grey ticks.
  8. Bob opens the conversation and sends a READ event. Alice's UI shows double blue ticks.
  9. If Bob is offline at step 5, Message Service calls the push notification service instead.

Interview Tip — client_msg_id for Idempotency

The client generates a UUID before sending. If the network drops after the message is persisted but before the ACK reaches Alice, she might resend. The server checks if client_msg_id already exists and returns the existing msg_id — no duplicate stored.

5. Group Chat Fan-Out

Group chat is more complex because one message must reach potentially hundreds of members. There are two strategies:

Fan-Out on Write (Active Delivery)

When a message is sent, immediately deliver it to every online member and create inbox entries for offline members. Used by WhatsApp. Good for small groups (<500 members).

Fan-Out on Read (Passive Storage)

Store the message once. When a member opens the group, fetch messages since last_seen. Used for very large groups (1,000+ members). Less real-time, but more storage-efficient.

Group message from Alice to Group(id=G1, members: Bob, Carol, Dave)
    │
    ▼
Message Service: persist message (group_id=G1, msg_id=M1)
    │
    ▼
Fan-out Worker: load members of G1 = [Bob, Carol, Dave]
    │
    ├── Is Bob online?   YES → publish to Redis user:bob
    ├── Is Carol online? NO  → enqueue push notification
    └── Is Dave online?  YES → publish to Redis user:dave

Redis pub/sub routes to WS servers holding Bob's and Dave's connections.
Carol gets a push notification. When Carol reconnects, she fetches
messages since her last_seen timestamp from Cassandra.

6. Database Schema

Chat systems are write-heavy and require fast time-range queries ("give me messages in channel X between time T1 and T2"). Cassandra's partition model maps perfectly to this access pattern.

Cassandra — message storage schema -- Partition by channel_id; cluster by message_id (time-based) descending -- Allows efficient "fetch last 50 messages in channel X" CREATE TABLE messages ( channel_id UUID, -- 1:1 chats: sorted(user_a, user_b) as UUID message_id TIMEUUID, -- time-ordered UUID (serves as timestamp too) sender_id UUID, content TEXT, type TEXT, -- 'text', 'image', 'video', 'file' media_key TEXT, -- S3 object key, NULL for text messages deleted BOOLEAN DEFAULT FALSE, PRIMARY KEY ((channel_id), message_id) ) WITH CLUSTERING ORDER BY (message_id DESC) AND gc_grace_seconds = 864000; -- Read: SELECT * FROM messages WHERE channel_id=? LIMIT 50; -- Pagination: AND message_id < ? (cursor-based) -- MySQL: channels and members (relational, low cardinality) CREATE TABLE channels ( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, type ENUM('direct','group') NOT NULL, name VARCHAR(200) NULL, created_by BIGINT UNSIGNED NOT NULL, created_at DATETIME NOT NULL, PRIMARY KEY (id) ); CREATE TABLE channel_members ( channel_id BIGINT UNSIGNED NOT NULL, user_id BIGINT UNSIGNED NOT NULL, joined_at DATETIME NOT NULL, last_read_message_id VARCHAR(36) NULL, PRIMARY KEY (channel_id, user_id), KEY idx_user_channels (user_id) );

7. Presence System

The presence system tracks which users are online. It must be fast (read on every incoming message to decide push vs WebSocket delivery) and eventually consistent (a few seconds of staleness is acceptable).

  • On WebSocket connect: SETEX presence:user:{id} 30 "ONLINE" in Redis
  • Client sends heartbeat every 10 seconds: refresh TTL to 30 seconds
  • On WebSocket disconnect: SET presence:user:{id} "OFFLINE" + store last_seen timestamp
  • Query: GET presence:user:{id} — O(1), <1ms
  • For friend presence lists: use Redis pipeline to batch-GET presence for all friends in one round trip

8. Message Delivery Receipts

Three-tier receipts (sent / delivered / read) require tracking per-message, per-user status.

Receipt flow — WebSocket events -- SENT: server persisted the message and ACKed to sender {"type":"receipt","msg_id":"M1","user_id":"alice","status":"SENT"} -- DELIVERED: receiver's device received the WS push or opened notification {"type":"receipt","msg_id":"M1","user_id":"bob","status":"DELIVERED"} -- READ: receiver opened the conversation containing this message {"type":"receipt","msg_id":"M1","user_id":"bob","status":"READ"} -- For group chats: aggregate receipts -- "Delivered to 5/6 members, Read by 3/6 members" -- Track in: message_receipts table (msg_id, user_id, status, updated_at)

Watch Out — Receipt Storage at Scale

5 billion messages/day × 2 receipt events each = 10 billion receipt writes/day. Do not store receipts in your primary MySQL — use Cassandra with a (msg_id, user_id) partition key, or Redis sorted sets for recent messages and Cassandra for historical.

9. Media Storage

Media files are stored in S3, not in the database. The flow uses pre-signed URLs to avoid proxying large files through your servers.

  1. Client requests a pre-signed upload URL: POST /api/media/upload-url?type=image&size=2048000
  2. Server generates a pre-signed S3 PUT URL (expires in 5 minutes) and returns it
  3. Client uploads directly to S3 using the pre-signed URL
  4. Client sends a message with type=IMAGE and the S3 object key
  5. Recipient requests a pre-signed download URL from your server to view the image
  6. Apply S3 lifecycle rules: move to Glacier after 90 days, delete after 1 year

10. Scaling WebSocket Connections

The challenge with WebSockets is that connections are stateful — you need to route messages to the right server holding a user's connection. Redis Pub/Sub solves the cross-server routing problem.

ConcernProblemSolution
Connection routingMessage for user B must reach the server holding B's connectionRedis Pub/Sub: publish to channel user:{B}, server holding B subscribes
Load balancer stickinessReconnects must reach the same server for session stateIP hash or cookie-based sticky sessions at L7 LB
Server capacity10M concurrent users, ~50K conns/server200 WebSocket servers; auto-scale based on connection count
Server crash recoveryAll users on a crashed server lose connectionClients auto-reconnect with exponential backoff; fetch missed messages since last_seen
Redis Pub/Sub scaleRedis single-node bottleneck for Pub/SubRedis Cluster + Consistent hashing channels to shards

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Chat System Design