1. Requirements Clarification

Functional Requirements

  • Return top-5 search suggestions for any given prefix
  • Suggestions ranked by query frequency (most-searched first)
  • Support incremental updates as new queries arrive
  • Filter profanity and banned terms from suggestions
  • Support multiple languages and character sets

Non-Functional Requirements

  • Latency: <100ms end-to-end; <20ms server-side processing
  • Scale: 10 billion searches per day on Google-scale; even mid-tier apps handle 100M queries/day
  • Availability: 99.99% uptime — autocomplete failure degrades UX but is not a P0
  • Data freshness: Suggestions updated within 1 hour of trending changes

2. Trie Data Structure

A Trie (prefix tree) is a tree data structure where each edge represents one character. Traversing from root to a node spells out a prefix. All words sharing a common prefix share the same path through the tree.

Words: "search", "searchable", "seal", "season"

         root
          │
          s
          │
          e
         / \
        a    a
        │    │
        r    l  (seal)
        │    │
        c    s  (seals)
        │
        h  (search)
        │
        a
        │
        b
        │
        l
        │
        e  (searchable)

Each node also stores: top-5 suggestions for this prefix
Node "sea": [{seal:5000}, {search:3000}, {season:2000}, {seabird:800}, {seafood:700}]
Node "sear": [{search:3000}, {searchable:1200}, {searcher:400}]

Storing top-K suggestions at each node is the key optimization. Without it, every query would need to traverse the entire subtree to collect and sort all matching words — O(number of words with this prefix). With pre-computed top-K at each node, lookup is O(prefix_length).

3. Frequency Scoring and Updates

Suggestions are ranked by query frequency. Two update strategies exist:

StrategyHow It WorksLatency to UpdateComplexityRecommended For
Batch RebuildAggregate query logs daily; rebuild full Trie from scratchUp to 24 hoursLowMost production systems
Real-Time UpdateUpdate Trie node counts on every search query<1 secondHigh (locking, consistency)Trending/news apps
Micro-BatchAggregate last 10 minutes of logs; update Trie every 10 min10 minutesMediumBalance of freshness and complexity

For most systems, batch rebuild (daily) is sufficient. The Trie is rebuilt offline from aggregate query logs (stored in a data warehouse or Hadoop) and then swapped in atomically. The live Trie continues serving traffic while the new one is being built.

Trie node — data structure class TrieNode: def __init__(self): self.children = {} # char → TrieNode self.is_end = False # marks complete word self.top_k = [] # pre-computed: [(score, word), ...] # top_k is maintained as a max-heap of size K class Trie: def __init__(self, k=5): self.root = TrieNode() self.k = k def insert(self, word: str, score: int): node = self.root for char in word.lower(): if char not in node.children: node.children[char] = TrieNode() node = node.children[char] # Update top_k at each prefix node self._update_top_k(node, word, score) node.is_end = True def search(self, prefix: str) -> list: node = self.root for char in prefix.lower(): if char not in node.children: return [] node = node.children[char] return [word for score, word in sorted(node.top_k, reverse=True)]

4. Distributed Trie — Sharding Strategy

A single-server Trie for billions of queries won't scale. Shard by the first two characters of the prefix so all queries with the same two-character prefix go to the same server.

Client: user types "sea..."
    │
    ▼
Routing Layer
    maps first 2 chars "se" → Shard 3
    │
    ▼
Shard 3  (handles: sa, sb, ..., se, sf, ..., sz)
    contains the full Trie subtree for all words starting with 's'
    │
    ▼
Returns top-5 for "sea": [seasonal, search, seattle, seabird, seal]

With 10 shards and 676 two-character prefixes (a-z × a-z), each shard handles ~67 prefixes. Some shards will be hotter than others (words starting with "th", "in", "co" are more frequent in English) — use weighted sharding or move hot prefixes to dedicated servers.

Interview Tip — Shard by First Character First

Start simple: shard by first character (26 shards for a-z). Only add two-character sharding if you prove a single letter's data exceeds one server's memory. Premature over-engineering is a red flag in interviews — start with the simplest design that meets requirements, then scale.

5. Caching Strategy

Autocomplete queries follow a Zipf distribution — a small number of prefixes account for the vast majority of traffic. Cache aggressively at multiple layers.

  • Browser cache: Cache last 100 queries per user in localStorage. Instant suggestions for repeated queries without any network call.
  • CDN cache: Cache responses for common prefixes at CDN edge nodes (TTL 1 hour). Works for non-personalized suggestions.
  • Redis cache: Cache top-K suggestions per prefix in Redis. TTL = 10 minutes. Key: autocomplete:prefix:{prefix}. A 1GB Redis cache holds ~500K unique prefix results.
  • In-process cache: Each Trie server keeps a local LRU cache of the last 10K queries. Eliminates Trie traversal for hot prefixes.

6. Query Pipeline — End to End

Query pipeline — data flow 1. User types "sea" → browser debounces 300ms 2. GET /autocomplete?q=sea 3. API Gateway checks CDN cache → HIT → return cached (0ms server) or 3. API Gateway routes to Trie Shard 3 (based on "se") 4. Shard 3 checks Redis cache: GET autocomplete:prefix:sea → HIT → return or 4. Shard 3 performs Trie lookup: traverse s→e→a, read top_k from node 5. Apply blocklist filter: remove any banned terms from top_k 6. Cache result in Redis: SET autocomplete:prefix:sea [...] EX 600 7. Return JSON: {"suggestions": ["seasonal","search","seattle","seabird","seal"]} → total server latency: ~3ms for cache hit, ~8ms for Trie lookup

7. Trie vs Elasticsearch for Autocomplete

FeatureCustom TrieElasticsearch Completion Suggester
Latency~1–5ms~5–15ms
Fuzzy matching (typos)No (exact prefix only)Yes (fuzziness parameter)
Filtering by categoryComplex custom logicBuilt-in with context suggesters
Engineering effortHigh (build, shard, monitor)Low (managed service)
Memory usageEfficient (only top-K per node)Higher (FST-based structure)
Real-time updatesRequires custom pipelineNear real-time index updates
Best forGoogle-scale, custom scoringMost production applications

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Search Autocomplete System Design