1. Requirements Clarification

Functional Requirements

  • Users can create posts (text, image, video)
  • Users can follow other users
  • Each user has a news feed showing posts from people they follow
  • Feed is ordered by recency (chronological) or ranked (engagement-based)
  • Infinite scroll with pagination
  • Likes and comments on posts (engagement signals)

Non-Functional Requirements

  • Scale: 500 million daily active users; 5 million posts per day; average user follows 500 accounts
  • Read latency: Feed loads in <200ms
  • Write latency: Post visible in feed within 5 seconds (eventual consistency acceptable)
  • Storage: Store posts for 10 years; keep active feed (last 1,000 items) in Redis per user

2. The Core Trade-off: Fan-Out Strategy

This is the central architectural decision in news feed design. How do you get Alice's new post into the feed of her 2,000 followers?

StrategyWrite CostRead CostLatencyBest For
Fan-out on Write (Push)High — write to N followers' feed listsVery low — read pre-built listRead: <5msRegular users (<10K followers)
Fan-out on Read (Pull)Zero — just store the postHigh — merge N followees' postsRead: proportional to followee countCelebrities (10M+ followers)
HybridWrite for regular users onlyMerge: pre-built + celebrity postsRead: <100msProduction systems (recommended)

3. High-Level Architecture (Hybrid Model)

  User posts  ──▶  Post Service  ──▶  posts table (MySQL)
                        │
                        ▼
                  Fan-out Service
                  (check follower count)
                        │
           ┌────────────┴────────────┐
           │                         │
    ≤ 500K followers          > 500K followers
    (regular user)            (celebrity)
           │                         │
           ▼                         ▼
    Push to each follower's    Skip fan-out
    Redis sorted set           post stored, no push
    feed:{user_id}
           │
           ▼
  User reads feed
    ├── Fetch pre-built feed from Redis sorted set  (regular followees)
    └── Fetch recent posts from followed celebrities (pull from posts table)
            │
            ▼
    Merge + Rank + Return top 20 posts

4. Feed Storage with Redis Sorted Sets

Redis sorted sets are the perfect data structure for a news feed. The member is the post_id, and the score is the post timestamp (Unix milliseconds). This provides O(log N) inserts and O(log N + K) range reads.

Redis — feed operations # Fan-out: when Alice (user 101) posts (post 9999, timestamp 1718000000000): # For each of Alice's followers [Bob=201, Carol=202, ...]: ZADD feed:201 1718000000000 "9999" ZADD feed:202 1718000000000 "9999" # ... for all N followers (async via Kafka workers) # Cap feed at 1000 items to prevent memory growth: ZREMRANGEBYRANK feed:201 0 -1001 # remove oldest beyond 1000 # Read first page of feed for Bob (user 201): ZREVRANGE feed:201 0 19 # 20 most recent post IDs # Cursor-based next page (cursor = last seen score): ZREVRANGEBYSCORE feed:201 (cursor_score -inf LIMIT 0 20 # Feed size check: ZCARD feed:201 # number of items in Bob's feed

5. The Celebrity Problem in Detail

Imagine Cristiano Ronaldo posts on Instagram. He has 600M followers. Fan-out on write would mean 600M Redis ZADD operations — that takes hours, costs a fortune, and would overwhelm any queue. The solution is to exclude him entirely from the fan-out pipeline.

At read time, the feed service does a lightweight merge:

  1. Fetch the user's pre-built feed from Redis (contains posts from regular followees)
  2. Identify followed celebrities (accounts with follower_count > 500K) from a cached celebrity_follows set
  3. For each celebrity, fetch their last 20 posts from a per-celebrity Redis list (small, only their own posts)
  4. Merge all lists, sort by timestamp (or rank score), return top 20

Interview Tip — The 500K Threshold is Configurable

In interviews, acknowledge that the celebrity threshold (500K followers) is a tunable parameter that balances write amplification against read merge cost. A user following 50 celebrities adds 50 extra Redis fetches at read time — manageable. A user following 1,000 celebrities would need a different approach (batch fetch with pipeline).

6. Database Schema

SQL — posts, follows, feed_items CREATE TABLE posts ( id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT, user_id BIGINT UNSIGNED NOT NULL, content TEXT NULL, media_url VARCHAR(500) NULL, type ENUM('text','image','video') NOT NULL DEFAULT 'text', like_count INT UNSIGNED NOT NULL DEFAULT 0, comment_count INT UNSIGNED NOT NULL DEFAULT 0, created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, is_deleted TINYINT(1) NOT NULL DEFAULT 0, PRIMARY KEY (id), KEY idx_user_created (user_id, created_at DESC) ); CREATE TABLE follows ( follower_id BIGINT UNSIGNED NOT NULL, followee_id BIGINT UNSIGNED NOT NULL, created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (follower_id, followee_id), KEY idx_followee_id (followee_id) -- "who follows me?" query ); -- feed_items: only needed if not using Redis for feed storage -- Represents the pre-computed fan-out (one row per follower per post) CREATE TABLE feed_items ( user_id BIGINT UNSIGNED NOT NULL, post_id BIGINT UNSIGNED NOT NULL, score BIGINT UNSIGNED NOT NULL, -- rank score or timestamp created_at DATETIME NOT NULL, PRIMARY KEY (user_id, post_id), KEY idx_user_score (user_id, score DESC) );

7. Ranking Signals

Pure chronological feeds are being replaced by ranked feeds at every major platform. Ranking requires computing a score for each candidate post. The score combines multiple signals:

  • Affinity: How often does the viewer interact with this author? (likes, comments, profile views in last 30 days)
  • Edge weight: Type of interaction — a comment signals more interest than a like; a share signals more than a comment
  • Time decay: Score multiplied by e^(-λ × hours_since_post) — posts lose value over time
  • Content type boost: Videos may get a 1.5× boost; local content gets a geographic boost
  • Negative feedback: "Hide" and "Not interested" decrease the author's affinity score for this viewer
Simplified EdgeRank-style score formula score = affinity_weight × edge_weight[interaction_type] × time_decay(hours_since_post) × content_type_multiplier # Example: # affinity_weight = 0.8 (user likes this author's content often) # edge_weight[COMMENT] = 3.0 # time_decay(2h) = e^(-0.1 × 2) = 0.82 # content_type_multiplier[VIDEO] = 1.5 # score = 0.8 × 3.0 × 0.82 × 1.5 = 2.95

8. Feed Pagination with Cursors

Offset-based pagination breaks when new posts arrive. Cursor-based pagination is the correct approach for feeds.

  • Cursor: The score (timestamp or rank score) of the last post on the current page
  • Next page query: Fetch posts with score strictly less than the cursor
  • Encoding: Base64-encode the cursor to make it opaque to clients
  • API: GET /feed?cursor=eyJzY29yZSI6MTcxOH0=&limit=20
  • Response: Include next_cursor if more items exist; omit if end of feed

Watch Out — Feed Cache Invalidation

When a post is deleted (privacy change, spam removal), it may still exist in Redis sorted sets for millions of users. Do not try to remove it from all sets — that is O(N followers) writes. Instead, filter deleted posts at read time: fetch 20 post IDs from Redis, look up their status in the posts table (batch GET), and skip any with is_deleted=1. Serve more from Redis if you need to fill the page after filtering.

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — News Feed System Design