1. The Problem with Modulo Hashing

Modulo hashing is the naive approach: assign key to node using node_index = hash(key) % N. It works perfectly when N is constant. The problem arises when you add or remove a server.

Modulo hashing: node = hash(key) % N

Initial state: 3 nodes (N=3)
  hash("user:1")  = 18  →  18 % 3 = 0  → Node 0
  hash("user:2")  = 25  →  25 % 3 = 1  → Node 1
  hash("user:3")  = 33  →  33 % 3 = 0  → Node 0

Add 1 node: now 4 nodes (N=4)
  hash("user:1")  = 18  →  18 % 4 = 2  → Node 2  ← CHANGED
  hash("user:2")  = 25  →  25 % 4 = 1  → Node 1  (same)
  hash("user:3")  = 33  →  33 % 4 = 1  → Node 1  ← CHANGED

Going from 3→4 nodes: ~75% of keys map to a different node.
In a cache with 1B keys, that's 750M cache misses hitting the DB simultaneously.

This is not a theoretical concern. In production, adding one cache server to reduce load can paradoxically cause a database meltdown as 75% of the cache becomes instantly invalid.

2. Consistent Hashing: The Ring

Consistent hashing imagines the hash space as a circular ring from 0 to 2^32 (or 2^64). Both nodes and keys are mapped onto this ring using the same hash function. A key is "owned" by the first node clockwise from its position on the ring.

Consistent Hash Ring (simplified, 0 to 100)

              0
           /     \
  Node C @10      90
         |         |
         |  Ring   |
         |         |
        40       Node B @60
           \     /
              Node A @50

Key "user:1" hashes to position 35
→ First clockwise node: Node A @50 (owns 10–50)
→ Node A is responsible for "user:1"

Key "user:2" hashes to position 75
→ First clockwise node: Node C @10 (wraps around ring)
→ Node C owns 60–10 (wrapping past 0)

Adding a Node

When you add Node D at position 30:

  • Node D now owns positions 10–30 (previously owned by Node A)
  • Only keys in the range 10–30 need to move from Node A to Node D
  • All other keys are unaffected
  • Keys remapped: ~1/N of total = 1/4 = 25% (vs 75% with modulo)

Removing a Node

When Node A at position 50 fails:

  • Node B (next clockwise at 60) inherits Node A's key range (30–60)
  • Only keys in range 30–60 need to be re-fetched (cache misses) or re-streamed (databases)
  • Keys 60–10 and 10–30 are completely unaffected

3. Virtual Nodes (Vnodes)

Without virtual nodes, load distribution is uneven. If Node A hashes to position 1 and Node B hashes to position 2 (adjacent), Node B ends up owning 99.9% of the ring. Virtual nodes solve this by assigning each physical server multiple positions on the ring.

Without virtual nodes (3 physical nodes):
  Node A → position 50
  Node B → position 51 (nearly adjacent — owns almost nothing)
  Node C → position 100 (owns 51–100 = 49%, Node A owns 100–50 = 50%)
  → Extremely uneven load

With virtual nodes (3 physical nodes × 3 vnodes each):
  Node A → positions 15, 55, 90
  Node B → positions 25, 65, 5
  Node C → positions 40, 75, 100
  → Ring positions (sorted): 5(B), 15(A), 25(B), 40(C), 55(A), 65(B), 75(C), 90(A), 100(C)
  → Each node owns ~33% of the ring (3 ranges each)
  → Even with unlucky physical hash positions, vnodes ensure balance

Production values: Redis Cluster uses 16,384 slots, Cassandra uses 256 vnodes by default

4. Implementation — Ring Lookup

Python — consistent hash ring implementation import hashlib import bisect class ConsistentHashRing: def __init__(self, nodes: list, vnodes: int = 150): self.vnodes = vnodes self.ring = {} # position → node_id self.sorted_keys = [] # sorted list of positions for node in nodes: self.add_node(node) def _hash(self, key: str) -> int: return int(hashlib.md5(key.encode()).hexdigest(), 16) def add_node(self, node: str): for i in range(self.vnodes): vnode_key = f"{node}:vnode:{i}" position = self._hash(vnode_key) self.ring[position] = node bisect.insort(self.sorted_keys, position) def remove_node(self, node: str): for i in range(self.vnodes): vnode_key = f"{node}:vnode:{i}" position = self._hash(vnode_key) del self.ring[position] self.sorted_keys.remove(position) def get_node(self, key: str) -> str: if not self.ring: return None position = self._hash(key) # Binary search: find first position >= key's hash idx = bisect.bisect_right(self.sorted_keys, position) # Wrap around: if past the end, go back to start of ring if idx == len(self.sorted_keys): idx = 0 return self.ring[self.sorted_keys[idx]] # Usage: ring = ConsistentHashRing(['cache-1', 'cache-2', 'cache-3']) ring.get_node("user:12345") # → "cache-2" ring.get_node("product:999") # → "cache-1" # Add a node — only ~1/4 of keys remap ring.add_node("cache-4") ring.get_node("user:12345") # might stay "cache-2" or move to "cache-4"

5. Real-World Usage

SystemHashing ImplementationHash FunctionVirtual Nodes
Redis Cluster16,384 hash slots; CRC16(key) mod 16384CRC1616,384 slots act as vnodes
Apache CassandraToken ring; each node owns a token rangeMurmur3256 vnodes per node (configurable)
Amazon DynamoDBPartition key hashed to internal shardInternal (undisclosed)Yes (transparent, managed)
Memcached (libketama)Classic consistent hash ringMD5160 vnodes per node
Apache RiakRing with 64–1024 partitionsSHA-1Each partition is a vnode

Redis Cluster's Approach: Hash Slots

Redis Cluster takes an explicit vnode approach: 16,384 slots are pre-defined, and each slot is assigned to a node. This makes slot migration explicit and deterministic. When you resize a cluster, you manually (or via tooling) assign a range of slots from one node to another — only those slots' keys are migrated. This is conceptually identical to vnodes on a ring, but with a fixed, known ring size rather than a dynamically computed one.

6. Consistent Hashing vs Modulo Hashing — Summary

PropertyModulo Hashing (N % nodes)Consistent Hashing
Keys remapped on node add/remove~(N-1)/N ≈ 75–99%~1/N ≈ 1–10%
Load balancePerfect (deterministic)Good with vnodes; uneven without
Lookup timeO(1)O(log V) — V = virtual nodes (binary search)
Node add/remove impactFull cache flushMinimal disruption
Implementation complexityTrivialModerate
Best forFixed-size, static clustersDynamic clusters (cloud, auto-scale)

How We Research and Update This Guide

We test the underlying formula or workflow, compare outputs with reliable references, and revise examples whenever the page content changes.

  • The workflow or formula is tested directly in the tool and compared against independent reference examples.
  • Examples are kept practical so readers can verify the result without hidden assumptions.
  • Pages are revised whenever the interface, calculation flow, or surrounding guidance materially changes.

Frequently Asked Questions — Consistent Hashing