1. Requirements Clarification
Functional Requirements
- Users can create posts (text, image, video)
- Users can follow other users
- Each user has a news feed showing posts from people they follow
- Feed is ordered by recency (chronological) or ranked (engagement-based)
- Infinite scroll with pagination
- Likes and comments on posts (engagement signals)
Non-Functional Requirements
- Scale: 500 million daily active users; 5 million posts per day; average user follows 500 accounts
- Read latency: Feed loads in <200ms
- Write latency: Post visible in feed within 5 seconds (eventual consistency acceptable)
- Storage: Store posts for 10 years; keep active feed (last 1,000 items) in Redis per user
2. The Core Trade-off: Fan-Out Strategy
This is the central architectural decision in news feed design. How do you get Alice's new post into the feed of her 2,000 followers?
| Strategy | Write Cost | Read Cost | Latency | Best For |
|---|---|---|---|---|
| Fan-out on Write (Push) | High — write to N followers' feed lists | Very low — read pre-built list | Read: <5ms | Regular users (<10K followers) |
| Fan-out on Read (Pull) | Zero — just store the post | High — merge N followees' posts | Read: proportional to followee count | Celebrities (10M+ followers) |
| Hybrid | Write for regular users only | Merge: pre-built + celebrity posts | Read: <100ms | Production systems (recommended) |
3. High-Level Architecture (Hybrid Model)
User posts ──▶ Post Service ──▶ posts table (MySQL)
│
▼
Fan-out Service
(check follower count)
│
┌────────────┴────────────┐
│ │
≤ 500K followers > 500K followers
(regular user) (celebrity)
│ │
▼ ▼
Push to each follower's Skip fan-out
Redis sorted set post stored, no push
feed:{user_id}
│
▼
User reads feed
├── Fetch pre-built feed from Redis sorted set (regular followees)
└── Fetch recent posts from followed celebrities (pull from posts table)
│
▼
Merge + Rank + Return top 20 posts
4. Feed Storage with Redis Sorted Sets
Redis sorted sets are the perfect data structure for a news feed. The member is the post_id, and the score is the post timestamp (Unix milliseconds). This provides O(log N) inserts and O(log N + K) range reads.
5. The Celebrity Problem in Detail
Imagine Cristiano Ronaldo posts on Instagram. He has 600M followers. Fan-out on write would mean 600M Redis ZADD operations — that takes hours, costs a fortune, and would overwhelm any queue. The solution is to exclude him entirely from the fan-out pipeline.
At read time, the feed service does a lightweight merge:
- Fetch the user's pre-built feed from Redis (contains posts from regular followees)
- Identify followed celebrities (accounts with follower_count > 500K) from a cached celebrity_follows set
- For each celebrity, fetch their last 20 posts from a per-celebrity Redis list (small, only their own posts)
- Merge all lists, sort by timestamp (or rank score), return top 20
Interview Tip — The 500K Threshold is Configurable
In interviews, acknowledge that the celebrity threshold (500K followers) is a tunable parameter that balances write amplification against read merge cost. A user following 50 celebrities adds 50 extra Redis fetches at read time — manageable. A user following 1,000 celebrities would need a different approach (batch fetch with pipeline).
6. Database Schema
7. Ranking Signals
Pure chronological feeds are being replaced by ranked feeds at every major platform. Ranking requires computing a score for each candidate post. The score combines multiple signals:
- Affinity: How often does the viewer interact with this author? (likes, comments, profile views in last 30 days)
- Edge weight: Type of interaction — a comment signals more interest than a like; a share signals more than a comment
- Time decay: Score multiplied by
e^(-λ × hours_since_post)— posts lose value over time - Content type boost: Videos may get a 1.5× boost; local content gets a geographic boost
- Negative feedback: "Hide" and "Not interested" decrease the author's affinity score for this viewer
8. Feed Pagination with Cursors
Offset-based pagination breaks when new posts arrive. Cursor-based pagination is the correct approach for feeds.
- Cursor: The score (timestamp or rank score) of the last post on the current page
- Next page query: Fetch posts with score strictly less than the cursor
- Encoding: Base64-encode the cursor to make it opaque to clients
- API:
GET /feed?cursor=eyJzY29yZSI6MTcxOH0=&limit=20 - Response: Include
next_cursorif more items exist; omit if end of feed
Watch Out — Feed Cache Invalidation
When a post is deleted (privacy change, spam removal), it may still exist in Redis sorted sets for millions of users. Do not try to remove it from all sets — that is O(N followers) writes. Instead, filter deleted posts at read time: fetch 20 post IDs from Redis, look up their status in the posts table (batch GET), and skip any with is_deleted=1. Serve more from Redis if you need to fill the page after filtering.
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — News Feed System Design
Fan-out on write: when a user posts, immediately push the post to all followers' feed lists (pre-computed feeds). Reads are instant — just fetch your pre-computed list. Write cost is high for users with many followers. Fan-out on read: when a user opens their feed, query all the people they follow and merge their recent posts. Write cost is zero, but read latency is high for users following many accounts. Most production systems use a hybrid: fan-out on write for regular users, fan-out on read for celebrities.
A celebrity with 10M followers cannot fan-out on write — that means 10M Redis writes per post. The celebrity problem is solved with a hybrid approach: celebrities (accounts with >N followers, typically >500K) are excluded from fan-out on write. Instead, at feed read time, the system fetches the celebrity's recent posts separately and merges them into the user's pre-computed feed. The threshold N is a tunable parameter — lower = fewer celebrities, more fan-out writes.
Offset pagination (page=2&size=20) is fragile — if a new post is inserted, your page 2 will skip items. Cursor pagination uses the last seen item's ID or timestamp as the cursor. For a feed stored in a Redis sorted set, ZREVRANGEBYSCORE feed:user:123 (cursor_score-1) -inf LIMIT 0 20 fetches the next 20 items before the cursor. The cursor is opaque to the client (base64 encoded timestamp). New posts don't affect past cursors.
Use a Redis sorted set per user: key = feed:{user_id}, member = post_id, score = post timestamp (Unix ms). ZADD feed:123 1718000000 "post:456". To read the feed: ZREVRANGE feed:123 0 19 (20 most recent). To paginate: ZREVRANGEBYSCORE feed:123 cursor_score -inf LIMIT 0 20. Cap the feed at a maximum size (e.g. 1,000 posts) using ZREMRANGEBYRANK on every write to avoid unbounded memory growth.
Pure chronological ordering was replaced by ML-ranked feeds at Facebook (EdgeRank → Graph Ranking), Instagram, and Twitter. Key signals include: affinity score (how often you interact with the author), edge weight (likes > comments > shares > views), post freshness (exponential time decay), media type boosting (video > image > text), negative feedback (hide/block), and diversification (limit posts from same author in sequence). In interviews, chronological + basic boosting is acceptable; mention ranking as a future enhancement.
Core tables: posts (id, user_id, content, media_url, created_at), follows (follower_id, followee_id, created_at), and feed_items (user_id, post_id, score, created_at) for the pre-computed feed. The feed_items table is only needed if you store the fan-out persistently in MySQL rather than Redis. For most architectures, Redis sorted sets replace the feed_items table entirely. Keep posts in MySQL/PostgreSQL; use Cassandra if write volume exceeds ~50K posts/second.