What is Consistent Hashing — System Design Explained [2026]

Q: What problem does consistent hashing solve?

Traditional modulo hashing assigns a key to a node via node = hash(key) % N. When you add or remove a node (N changes), almost all keys remap to a different node — effectively a full cache flush. In a distributed cache with 1 billion keys, adding one server remaps ~800 million keys, causing a massive miss rate spike and database overload. Consistent hashing solves this by mapping both keys and nodes onto a ring. Adding or removing a node only remaps approximately 1/N keys — the ones between the changed node and its predecessor.

Q: What are virtual nodes and why are they necessary?

Without virtual nodes, consistent hashing creates uneven load distribution. If you have 3 nodes and their hashes land at positions 100, 101, and 102 on the ring (close together), node 1 (at position 100) ends up owning almost the entire ring. Virtual nodes (vnodes) solve this by assigning each physical node 100–150 positions on the ring (not just one). The positions are spread evenly, so each physical node handles approximately 1/N of the key space regardless of where the raw hash falls. Redis Cluster uses 16,384 hash slots as a form of virtual node partitioning.

Q: How many keys get remapped when a node is added or removed?

With consistent hashing and N nodes, adding or removing one node remaps approximately 1/N of all keys. With 10 nodes: ~10% of keys remap. With 100 nodes: ~1% of keys remap. Compare to modulo hashing where adding one node remaps (N-1)/N ≈ 99% of all keys. This is why consistent hashing is essential for distributed caches — a rolling deployment that adds one server at a time causes minimal cache miss spikes rather than a full cache flush.

Q: How does Cassandra use consistent hashing?

Cassandra uses consistent hashing to determine which node(s) store each row. Each node is assigned a token (position on the ring) at setup. A row's partition key is hashed (Murmur3) to a position on the ring, and the row is stored on the node with the nearest clockwise token (the primary replica). For replication factor RF=3, the row is also stored on the next 2 clockwise nodes. When you add a node to a Cassandra cluster, only the token range that the new node "takes over" is streamed from the adjacent node — all other data is undisturbed.

Q: What hash function is used in consistent hashing?

The hash function must map keys and node IDs uniformly across the ring space (typically 0 to 2^32 or 0 to 2^64). MD5 was historically common (used in original Ketama algorithm for Memcached). SHA-1 is also used. Modern systems prefer Murmur3 or xxHash — both are faster than cryptographic hashes and have good distribution properties. The hash does NOT need to be cryptographically secure (no security requirement) — just uniform and fast. MurmurHash3 is 10–20x faster than MD5 for the same distribution quality.

Q: How does consistent hashing differ from rendezvous hashing?

Rendezvous hashing (Highest Random Weight / HRW) is an alternative to consistent hashing. For each key, compute a hash for every (key, node) pair and assign the key to the node with the highest hash. Both approaches remap ~1/N keys on node change. Rendezvous hashing is simpler (no ring data structure) and has perfect load balance by definition. The downside: O(N) lookup time vs O(log N) for consistent hashing with a sorted ring. Rendezvous hashing is preferred when N is small (e.g. <50 nodes); consistent hashing scales better to large N.

1. The Problem with Modulo Hashing

Modulo hashing is the naive approach: assign key to node using node_index = hash(key) % N. It works perfectly when N is constant. The problem arises when you add or remove a server.

Modulo hashing: node = hash(key) % N

Initial state: 3 nodes (N=3)
  hash("user:1")  = 18  →  18 % 3 = 0  → Node 0
  hash("user:2")  = 25  →  25 % 3 = 1  → Node 1
  hash("user:3")  = 33  →  33 % 3 = 0  → Node 0

Add 1 node: now 4 nodes (N=4)
  hash("user:1")  = 18  →  18 % 4 = 2  → Node 2  ← CHANGED
  hash("user:2")  = 25  →  25 % 4 = 1  → Node 1  (same)
  hash("user:3")  = 33  →  33 % 4 = 1  → Node 1  ← CHANGED

Going from 3→4 nodes: ~75% of keys map to a different node.
In a cache with 1B keys, that's 750M cache misses hitting the DB simultaneously.

This is not a theoretical concern. In production, adding one cache server to reduce load can paradoxically cause a database meltdown as 75% of the cache becomes instantly invalid.

2. Consistent Hashing: The Ring

Consistent hashing imagines the hash space as a circular ring from 0 to 2^32 (or 2^64). Both nodes and keys are mapped onto this ring using the same hash function. A key is "owned" by the first node clockwise from its position on the ring.

Consistent Hash Ring (simplified, 0 to 100)

              0
           /     \
  Node C @10      90
         |         |
         |  Ring   |
         |         |
        40       Node B @60
           \     /
              Node A @50

Key "user:1" hashes to position 35
→ First clockwise node: Node A @50 (owns 10–50)
→ Node A is responsible for "user:1"

Key "user:2" hashes to position 75
→ First clockwise node: Node C @10 (wraps around ring)
→ Node C owns 60–10 (wrapping past 0)

Adding a Node

When you add Node D at position 30:

Node D now owns positions 10–30 (previously owned by Node A)
Only keys in the range 10–30 need to move from Node A to Node D
All other keys are unaffected
Keys remapped: ~1/N of total = 1/4 = 25% (vs 75% with modulo)

Removing a Node

When Node A at position 50 fails:

Node B (next clockwise at 60) inherits Node A's key range (30–60)
Only keys in range 30–60 need to be re-fetched (cache misses) or re-streamed (databases)
Keys 60–10 and 10–30 are completely unaffected

3. Virtual Nodes (Vnodes)

Without virtual nodes, load distribution is uneven. If Node A hashes to position 1 and Node B hashes to position 2 (adjacent), Node B ends up owning 99.9% of the ring. Virtual nodes solve this by assigning each physical server multiple positions on the ring.

Without virtual nodes (3 physical nodes):
  Node A → position 50
  Node B → position 51 (nearly adjacent — owns almost nothing)
  Node C → position 100 (owns 51–100 = 49%, Node A owns 100–50 = 50%)
  → Extremely uneven load

With virtual nodes (3 physical nodes × 3 vnodes each):
  Node A → positions 15, 55, 90
  Node B → positions 25, 65, 5
  Node C → positions 40, 75, 100
  → Ring positions (sorted): 5(B), 15(A), 25(B), 40(C), 55(A), 65(B), 75(C), 90(A), 100(C)
  → Each node owns ~33% of the ring (3 ranges each)
  → Even with unlucky physical hash positions, vnodes ensure balance

Production values: Redis Cluster uses 16,384 slots, Cassandra uses 256 vnodes by default

4. Implementation — Ring Lookup

Python — consistent hash ring implementation import hashlib import bisect class ConsistentHashRing: def __init__(self, nodes: list, vnodes: int = 150): self.vnodes = vnodes self.ring = {} # position → node_id self.sorted_keys = [] # sorted list of positions for node in nodes: self.add_node(node) def _hash(self, key: str) -> int: return int(hashlib.md5(key.encode()).hexdigest(), 16) def add_node(self, node: str): for i in range(self.vnodes): vnode_key = f"{node}:vnode:{i}" position = self._hash(vnode_key) self.ring[position] = node bisect.insort(self.sorted_keys, position) def remove_node(self, node: str): for i in range(self.vnodes): vnode_key = f"{node}:vnode:{i}" position = self._hash(vnode_key) del self.ring[position] self.sorted_keys.remove(position) def get_node(self, key: str) -> str: if not self.ring: return None position = self._hash(key) # Binary search: find first position >= key's hash idx = bisect.bisect_right(self.sorted_keys, position) # Wrap around: if past the end, go back to start of ring if idx == len(self.sorted_keys): idx = 0 return self.ring[self.sorted_keys[idx]] # Usage: ring = ConsistentHashRing(['cache-1', 'cache-2', 'cache-3']) ring.get_node("user:12345") # → "cache-2" ring.get_node("product:999") # → "cache-1" # Add a node — only ~1/4 of keys remap ring.add_node("cache-4") ring.get_node("user:12345") # might stay "cache-2" or move to "cache-4"

5. Real-World Usage

System	Hashing Implementation	Hash Function	Virtual Nodes
Redis Cluster	16,384 hash slots; CRC16(key) mod 16384	CRC16	16,384 slots act as vnodes
Apache Cassandra	Token ring; each node owns a token range	Murmur3	256 vnodes per node (configurable)
Amazon DynamoDB	Partition key hashed to internal shard	Internal (undisclosed)	Yes (transparent, managed)
Memcached (libketama)	Classic consistent hash ring	MD5	160 vnodes per node
Apache Riak	Ring with 64–1024 partitions	SHA-1	Each partition is a vnode

Redis Cluster's Approach: Hash Slots

Redis Cluster takes an explicit vnode approach: 16,384 slots are pre-defined, and each slot is assigned to a node. This makes slot migration explicit and deterministic. When you resize a cluster, you manually (or via tooling) assign a range of slots from one node to another — only those slots' keys are migrated. This is conceptually identical to vnodes on a ring, but with a fixed, known ring size rather than a dynamically computed one.

6. Consistent Hashing vs Modulo Hashing — Summary

Property	Modulo Hashing (N % nodes)	Consistent Hashing
Keys remapped on node add/remove	~(N-1)/N ≈ 75–99%	~1/N ≈ 1–10%
Load balance	Perfect (deterministic)	Good with vnodes; uneven without
Lookup time	O(1)	O(log V) — V = virtual nodes (binary search)
Node add/remove impact	Full cache flush	Minimal disruption
Implementation complexity	Trivial	Moderate
Best for	Fixed-size, static clusters	Dynamic clusters (cloud, auto-scale)

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

The workflow or formula is tested directly in the tool and compared against independent reference examples.
Examples are kept practical so readers can verify the result without hidden assumptions.
Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Consistent Hashing

What problem does consistent hashing solve?

What are virtual nodes and why are they necessary?

How many keys get remapped when a node is added or removed?

How does Cassandra use consistent hashing?

What hash function is used in consistent hashing?

How does consistent hashing differ from rendezvous hashing?