How to Design Search Autocomplete — System Design Interview [2026]

Q: How does a Trie data structure work for autocomplete?

A Trie (prefix tree) is a tree where each node represents one character. The path from root to a node spells out a prefix. Searching for suggestions for prefix "sea" traverses: root → s → e → a, then collects all words in the subtree below the "a" node. Each node stores the top-K suggestions for its prefix (pre-computed) to avoid traversing the entire subtree on every query. This gives O(P + K) lookup time where P = prefix length and K = number of suggestions returned.

Q: How do you score and rank autocomplete suggestions?

Frequency scoring is the baseline: rank by how often each term was searched. Score = log(query_count) * recency_boost * personalization_boost. Recency boost decays older queries: boost = e^(-λ × days_since_last_search). Personalization: promote terms the user has searched before. Each Trie node stores only the top-K (e.g. top-5) suggestions by score, pruned at build time. During online updates, use a sliding window counter (last 7 days) rather than all-time count to keep suggestions fresh.

Q: What is the latency requirement for search autocomplete?

Users expect suggestions to appear in under 100ms after each keystroke — ideally under 50ms. This constrains every layer: the Trie lookup must be in-memory (not a database query), the result must be cached aggressively, and the network round-trip must be minimized. In practice, teams add a 300ms debounce on the frontend (don't send a request on every keystroke — wait 300ms of inactivity) which gives the backend more slack. Even so, the server-side target is typically <20ms.

Q: How do you shard a Trie across multiple servers?

Shard by the first two characters of the prefix. All queries starting with "se" go to shard 1, "sh" to shard 2, etc. With 26 × 26 = 676 possible two-character prefixes and, say, 10 shards, each shard handles ~67 prefixes. A routing layer maps prefix → shard using a static hash table (not consistent hashing — prefix mapping is deterministic). When rebuilding the Trie (daily batch job), rebuild only the affected shards for changed prefixes.

Q: How do you filter profanity and spam from autocomplete?

Maintain a blocklist of prohibited terms (profanity, hate speech, competitor names). Filter at two points: (1) at Trie build time — exclude blocked terms from the frequency counters so they never enter the Trie; (2) at query time — after retrieving top-K suggestions, apply a fast regex or hash-set check against the blocklist. The blocklist is small enough to keep in memory. For spam queries (artificially inflated by bots), apply rate limiting and anomaly detection on the query logging pipeline before counts feed into the Trie.

Q: When should you use Elasticsearch instead of a custom Trie?

Elasticsearch's completion suggester is a production-ready alternative to a custom Trie. Use Elasticsearch when: you need fuzzy matching (handle typos), multi-language support, or complex filtering (e.g. autocomplete for products within a category). A custom Trie gives lower latency (~1ms vs ~5ms for ES) and full control over scoring, but requires significant engineering effort to build and operate. For most companies, Elasticsearch is the right default. Build a custom Trie only when you have proven scale requirements and dedicated search engineering resources.

1. Requirements Clarification

Functional Requirements

Return top-5 search suggestions for any given prefix
Suggestions ranked by query frequency (most-searched first)
Support incremental updates as new queries arrive
Filter profanity and banned terms from suggestions
Support multiple languages and character sets

Non-Functional Requirements

Latency: <100ms end-to-end; <20ms server-side processing
Scale: 10 billion searches per day on Google-scale; even mid-tier apps handle 100M queries/day
Availability: 99.99% uptime — autocomplete failure degrades UX but is not a P0
Data freshness: Suggestions updated within 1 hour of trending changes

2. Trie Data Structure

A Trie (prefix tree) is a tree data structure where each edge represents one character. Traversing from root to a node spells out a prefix. All words sharing a common prefix share the same path through the tree.

Words: "search", "searchable", "seal", "season"

         root
          │
          s
          │
          e
         / \
        a    a
        │    │
        r    l  (seal)
        │    │
        c    s  (seals)
        │
        h  (search)
        │
        a
        │
        b
        │
        l
        │
        e  (searchable)

Each node also stores: top-5 suggestions for this prefix
Node "sea": [{seal:5000}, {search:3000}, {season:2000}, {seabird:800}, {seafood:700}]
Node "sear": [{search:3000}, {searchable:1200}, {searcher:400}]

Storing top-K suggestions at each node is the key optimization. Without it, every query would need to traverse the entire subtree to collect and sort all matching words — O(number of words with this prefix). With pre-computed top-K at each node, lookup is O(prefix_length).

3. Frequency Scoring and Updates

Suggestions are ranked by query frequency. Two update strategies exist:

Strategy	How It Works	Latency to Update	Complexity	Recommended For
Batch Rebuild	Aggregate query logs daily; rebuild full Trie from scratch	Up to 24 hours	Low	Most production systems
Real-Time Update	Update Trie node counts on every search query	<1 second	High (locking, consistency)	Trending/news apps
Micro-Batch	Aggregate last 10 minutes of logs; update Trie every 10 min	10 minutes	Medium	Balance of freshness and complexity

For most systems, batch rebuild (daily) is sufficient. The Trie is rebuilt offline from aggregate query logs (stored in a data warehouse or Hadoop) and then swapped in atomically. The live Trie continues serving traffic while the new one is being built.

Trie node — data structure class TrieNode: def __init__(self): self.children = {} # char → TrieNode self.is_end = False # marks complete word self.top_k = [] # pre-computed: [(score, word), ...] # top_k is maintained as a max-heap of size K class Trie: def __init__(self, k=5): self.root = TrieNode() self.k = k def insert(self, word: str, score: int): node = self.root for char in word.lower(): if char not in node.children: node.children[char] = TrieNode() node = node.children[char] # Update top_k at each prefix node self._update_top_k(node, word, score) node.is_end = True def search(self, prefix: str) -> list: node = self.root for char in prefix.lower(): if char not in node.children: return [] node = node.children[char] return [word for score, word in sorted(node.top_k, reverse=True)]

4. Distributed Trie — Sharding Strategy

A single-server Trie for billions of queries won't scale. Shard by the first two characters of the prefix so all queries with the same two-character prefix go to the same server.

Client: user types "sea..."
    │
    ▼
Routing Layer
    maps first 2 chars "se" → Shard 3
    │
    ▼
Shard 3  (handles: sa, sb, ..., se, sf, ..., sz)
    contains the full Trie subtree for all words starting with 's'
    │
    ▼
Returns top-5 for "sea": [seasonal, search, seattle, seabird, seal]

With 10 shards and 676 two-character prefixes (a-z × a-z), each shard handles ~67 prefixes. Some shards will be hotter than others (words starting with "th", "in", "co" are more frequent in English) — use weighted sharding or move hot prefixes to dedicated servers.

Interview Tip — Shard by First Character First

Start simple: shard by first character (26 shards for a-z). Only add two-character sharding if you prove a single letter's data exceeds one server's memory. Premature over-engineering is a red flag in interviews — start with the simplest design that meets requirements, then scale.

5. Caching Strategy

Autocomplete queries follow a Zipf distribution — a small number of prefixes account for the vast majority of traffic. Cache aggressively at multiple layers.

Browser cache: Cache last 100 queries per user in localStorage. Instant suggestions for repeated queries without any network call.
CDN cache: Cache responses for common prefixes at CDN edge nodes (TTL 1 hour). Works for non-personalized suggestions.
Redis cache: Cache top-K suggestions per prefix in Redis. TTL = 10 minutes. Key: autocomplete:prefix:{prefix}. A 1GB Redis cache holds ~500K unique prefix results.
In-process cache: Each Trie server keeps a local LRU cache of the last 10K queries. Eliminates Trie traversal for hot prefixes.

6. Query Pipeline — End to End

Query pipeline — data flow 1. User types "sea" → browser debounces 300ms 2. GET /autocomplete?q=sea 3. API Gateway checks CDN cache → HIT → return cached (0ms server) or 3. API Gateway routes to Trie Shard 3 (based on "se") 4. Shard 3 checks Redis cache: GET autocomplete:prefix:sea → HIT → return or 4. Shard 3 performs Trie lookup: traverse s→e→a, read top_k from node 5. Apply blocklist filter: remove any banned terms from top_k 6. Cache result in Redis: SET autocomplete:prefix:sea [...] EX 600 7. Return JSON: {"suggestions": ["seasonal","search","seattle","seabird","seal"]} → total server latency: ~3ms for cache hit, ~8ms for Trie lookup

7. Trie vs Elasticsearch for Autocomplete

Feature	Custom Trie	Elasticsearch Completion Suggester
Latency	~1–5ms	~5–15ms
Fuzzy matching (typos)	No (exact prefix only)	Yes (fuzziness parameter)
Filtering by category	Complex custom logic	Built-in with context suggesters
Engineering effort	High (build, shard, monitor)	Low (managed service)
Memory usage	Efficient (only top-K per node)	Higher (FST-based structure)
Real-time updates	Requires custom pipeline	Near real-time index updates
Best for	Google-scale, custom scoring	Most production applications

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

The workflow or formula is tested directly in the tool and compared against independent reference examples.
Examples are kept practical so readers can verify the result without hidden assumptions.
Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Search Autocomplete System Design

How does a Trie data structure work for autocomplete?

How do you score and rank autocomplete suggestions?

What is the latency requirement for search autocomplete?

How do you shard a Trie across multiple servers?

How do you filter profanity and spam from autocomplete?

When should you use Elasticsearch instead of a custom Trie?