1. Requirements Clarification
Before jumping into architecture, nail down requirements. Interviewers want to see you drive this conversation.
Functional Requirements
- Given a long URL, generate a unique short URL (e.g.
https://fb.in/aB3x9Z) - Redirect users from short URL to original long URL
- Support custom aliases (e.g.
fb.in/my-product-launch) - URL expiry — optional TTL per link
- Click analytics — count clicks, geographic breakdown, referrer
- User accounts — link ownership and management
Non-Functional Requirements
- Scale: 100 million URLs created per day, 10:1 read-to-write ratio = 1 billion redirects/day
- Redirect latency: <10ms at P99
- High availability: 99.99% uptime (52 minutes downtime/year)
- Storage: URLs kept for 5 years; ~500 bytes per record → 100M × 365 × 5 × 500B ≈ 91 TB
- Eventual consistency acceptable for analytics; strong consistency required for URL creation (no duplicate short codes)
2. High-Level Architecture
Client
│
▼
┌─────────────┐ ┌─────────────────────────────────────────┐
│ CDN Edge │────▶│ Load Balancer (L7) │
│ (CloudFront)│ └────────────────┬────────────────────────┘
└─────────────┘ │
┌──────────┴──────────┐
│ │
┌──────▼──────┐ ┌────────▼───────┐
│ Redirect │ │ Write Service │
│ Service │ │ (URL Creation) │
│ (read-only)│ └────────┬───────┘
└──────┬──────┘ │
│ ┌───────▼──────┐
┌──────▼──────┐ │ ID Generator│
│ Redis Cache│ │ (Snowflake) │
│ (hot URLs) │ └───────┬──────┘
└──────┬──────┘ │
│ ┌───────▼──────┐
┌──────▼──────────────▼──────────────┐
│ MySQL / PostgreSQL │
│ (Primary + Read Replicas) │
└────────────────────────────────────┘
│
┌──────▼──────┐
│ Kafka │ ← async click events
└──────┬──────┘
│
┌──────▼──────┐
│ Analytics │
│ Pipeline │
└─────────────┘
3. ID Generation Strategies
The short code is the heart of the system. Getting ID generation right determines scalability and correctness.
| Strategy | Example Output | Collision Risk | Distributed-Safe | Recommendation |
|---|---|---|---|---|
| Auto-increment DB ID + Base62 | aB3x9Z | None | Single writer only | Good for small scale |
| Snowflake ID + Base62 | 3Kp7mN | None | Yes (multiple writers) | Best for large scale |
| MD5 hash, first 6 chars | a3f9bc | Low but non-zero | Yes | Avoid — needs collision handling |
| UUID truncated | 3f8a-bc | Very low | Yes | Longer codes, harder to type |
| Random 6-char base62 | Xq8Z2m | Grows with scale | Yes | Acceptable with collision check |
Base62 Encoding Explained
Base62 uses digits 0–9, uppercase A–Z, and lowercase a–z — 62 characters total. It produces URL-safe codes without special characters. Here is the encoding algorithm:
Interview Tip — Why Not MD5?
MD5 of "https://example.com" starts with "5d41…". Taking the first 6 hex characters gives 16^6 = 16.7M combinations — birthday paradox means collisions start appearing around 4,000 URLs. Base62 of a unique integer ID has zero collisions by design.
4. Database Schema
The schema is intentionally simple. The urls table is the single source of truth. We optimise read performance through indexing and caching rather than schema complexity.
5. Redirect Flow — Cache-First
The redirect path must be as fast as possible. Every millisecond adds up when you serve 11,500 redirects per second. The cache-first pattern keeps database load near zero for popular URLs.
User clicks short URL
│
▼
CDN Edge Cache? ──YES──▶ 301 Redirect (cached at edge, ~2ms)
│NO
▼
Redirect Server
│
▼
Redis Cache? ──YES──▶ 302 Redirect + async Kafka event (~5ms)
│NO
▼
Database Read ──FOUND──▶ Populate Redis + Redirect (~15ms)
│NOT FOUND
▼
Return 404
Redis key structure: url:{short_code} → JSON with long_url, expires_at, user_id. TTL on Redis key matches URL expiry, or defaults to 24 hours for non-expiring URLs (with lazy refresh on access). A single Redis node handles 100,000+ ops/second — far above our 11,500 redirects/second requirement.
6. Write Path — URL Creation
URL creation is less frequent than reads (10:1 ratio) but requires strong consistency — two concurrent requests must not get the same short code.
- Client sends
POST /api/shortenwithlong_url, optionalcustom_alias, optionalexpires_at - Write Service validates the URL (regex + optional reachability check)
- If custom alias: check DB for availability, return 409 if taken
- Otherwise: request next ID from ID Generator (Snowflake service or DB auto-increment)
- Encode ID to base62 to get
short_code - Insert row into
urlstable (UNIQUE constraint on short_code prevents races) - Return short URL to client
Distributed ID Generation with Snowflake
Twitter's Snowflake format: 41-bit timestamp + 10-bit machine ID + 12-bit sequence = 64-bit integer. Generates 4,096 unique IDs per millisecond per machine, sortable by time, and produces shorter base62 codes than UUID. Many teams use a dedicated ID service or database sequence instead of full Snowflake for simplicity.
7. Analytics Pipeline
Analytics must never slow down the redirect. The pattern is fire-and-forget asynchronous logging.
Redirect Server
│
├──▶ 302 Redirect (synchronous, <5ms)
│
└──▶ Kafka Producer (async, non-blocking)
topic: click-events
key: short_code (partitioned for ordering)
value: { short_code, timestamp, ip, user_agent, referrer }
│
▼
Flink / Spark Streaming
(enriches IP → country, aggregates counts)
│
┌────────┴────────┐
▼ ▼
ClickHouse Redis Counters
(analytics queries) (real-time counts)
8. Scaling Considerations
At 1 billion redirects per day (~11,500 rps), a single server is a bottleneck. Here is how to scale each layer:
| Layer | Bottleneck | Solution | Scale Target |
|---|---|---|---|
| Redirect Service | CPU / network | Horizontal scale behind LB | 50+ stateless instances |
| Redis Cache | Memory | Redis Cluster, 6 shards | 600GB total cache |
| Database reads | IOPS | Read replicas (5-10×) | 99% cache hit rate → low DB load |
| Database writes | Write throughput | Master with async replication | ~1,150 writes/sec (10:1 ratio) |
| CDN | Global latency | 301 redirect cached at edge | ~80% of traffic handled at CDN |
| Analytics | Write throughput | Kafka partitioned by short_code | Millions of events/sec |
9. URL Expiry
URLs with an expires_at timestamp must return a 410 Gone response after expiry. Implement this at two levels:
- At redirect time: Check
expires_atin the cached URL object. If expired, return 410 and delete from cache. - Background cleanup job: Nightly cron deletes expired rows from the DB (
DELETE FROM urls WHERE expires_at < NOW() AND expires_at IS NOT NULL LIMIT 10000). Use batched deletes to avoid locking the table.
Watch Out — Cache Serving Expired URLs
If a URL is cached in Redis but has expired in the DB, the redirect service might still serve it. Fix by storing expires_at in the Redis value and checking it on each redirect. Set the Redis TTL to expires_at - now() so Redis automatically evicts it at the right time.
10. Trade-offs and Design Decisions
| Decision | Chose | Alternative | Reason |
|---|---|---|---|
| Redirect type | 302 (analytics mode) | 301 (performance mode) | Analytics requirement overrides cache benefit |
| ID generation | Snowflake + Base62 | Hash-based | Zero collisions, sortable, compact codes |
| Primary DB | MySQL (relational) | Cassandra (NoSQL) | Strong consistency for write path; read replicas handle scale |
| Analytics storage | ClickHouse (columnar) | MySQL | Columnar storage 100× faster for aggregation queries |
| Cache strategy | Cache-aside | Write-through | We only cache URLs actually accessed (Zipf distribution) |
How We Research and Update This Guide
We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.
- The workflow or formula is tested directly in the tool and compared against independent reference examples.
- Examples are kept practical so readers can verify the result without hidden assumptions.
- Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.
Frequently Asked Questions — URL Shortener System Design
Using base62 encoding of an auto-increment counter. Base62 uses characters A-Z, a-z, and 0-9. A 6-character base62 string gives 62^6 ≈ 56 billion unique URLs — more than enough for any production system. This guarantees no collisions and produces short, URL-safe codes without special characters.
301 (Moved Permanently) tells browsers and CDNs to cache the redirect forever. The browser will not hit your redirect service again — faster for users, lower server load, but you lose click analytics. 302 (Found / Moved Temporarily) is not cached by default, so every click goes through your redirect server — enables per-click analytics, A/B testing, and expiry enforcement, at the cost of one extra network hop.
Custom aliases (e.g. bit.ly/my-brand) are stored in the same urls table with a user-supplied short_code. Before saving, query the DB to check the alias is not already taken. Reserve a namespace of auto-generated codes that does not overlap with user aliases (e.g. auto codes are all lowercase, user aliases require at least one uppercase) or use a separate uniqueness check.
Use an auto-increment integer ID (from the database or a distributed ID generator like Snowflake) and encode it in base62. Since each ID is unique by definition, there are zero collisions. MD5/SHA-based approaches that take the first 6 characters of a hash DO have collision risk and require collision-resolution loops — avoid them for high-volume systems.
Cache hot URLs in Redis (a 1GB Redis cache can hold ~10 million short codes). Serve redirects from edge CDN nodes (CloudFront, Cloudflare) for sub-10ms latency globally. Use read replicas for the database. The redirect path (read-only) scales horizontally — add stateless redirect servers behind a load balancer.
Log click events asynchronously to Kafka — never block the redirect on analytics writes. A stream processor (Flink, Spark Streaming) consumes Kafka events and aggregates counts, geographic data, and referrer information. Store aggregates in a time-series database (InfluxDB, TimescaleDB) or pre-aggregate in Redis counters. This keeps the redirect path at <10ms while providing rich analytics.