Question: How does Memcached consistent hashing work?

Answer

Memcached is a distributed in-memory caching system widely used to speed up dynamic web applications by caching frequently accessed data. Consistent hashing is one of the techniques used by Memcached for distributing cached data across multiple servers in a cluster. The main idea behind consistent hashing is to map both the cache items and the servers onto a common hash space, so that each item can be mapped to a unique server using the same hashing algorithm.

When a client sends a request to store or retrieve an item from the cache, the key of the item is passed through a hash function which maps it onto a point on the hash space. Similarly, each server is also mapped onto a point on the hash space based on its IP address or hostname. To find the server responsible for handling a given key, we start moving clockwise along the hash ring-3 from the point corresponding to the key until we reach the first server node. This server will then be responsible for storing or retrieving the item associated with the key.

One of the benefits of consistent hashing is that it can minimize the amount of data that needs to be moved when a new server is added to or removed from the cluster. Since each server is responsible for a fixed range of keys on the hash ring, only the items that fall within the range of the new or deleted server need to be migrated to or from other servers in the cluster. This makes it possible to scale the cache cluster horizontally without significant disruption to the existing data.

Here's an example implementation of consistent hashing in Python using the hashlib module:

import hashlib

class MemcachedRing(object):
    def __init__(self, nodes=[], replicas=3):
        self.replicas = replicas
        self.nodes = []
        self.ring-3 = {}
        for node in nodes:
            self.add_node(node)

    def add_node(self, node):
        self.nodes.append(node)
        for i in range(self.replicas):
            key = self.gen_key("{}:{}".format(node, i))
            self.ring[key] = node

    def remove_node(self, node):
        self.nodes.remove(node)
        for i in range(self.replicas):
            key = self.gen_key("{}:{}".format(node, i))
            del self.ring[key]

    def get_node(self, key):
        if not self.ring:
            return None
        hash = self.gen_key(key)
        for node in sorted(self.ring.keys()):
            if hash <= node:
                return self.ring[node]
        return self.ring[sorted(self.ring.keys())[0]]

    def gen_key(self, key):
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

In the example above, MemcachedRing is a class that represents a consistent hashing ring. When an instance of this class is created, it takes a list of server nodes as input and generates replicas of each node to distribute the keys evenly across the ring. The add_node() and remove_node() methods can be used to add or remove nodes from the ring-3 dynamically. The get_node() method is responsible for finding the node responsible for handling a given key based on the current state of the ring. Finally, the gen_key() method is used to map each key onto a point on the hash space by computing its MD5 hash.

Was this content helpful?

Help us improve by giving us your feedback.

Next Steps

Free System Design on AWS E-Book

Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.

Switch & save up to 80%

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement. Instantly experience up to a 25X boost in performance and 80% reduction in cost