I talked in my previous post about Redis eviction policies. In this post, I would like to describe the design behind Dragonfly cache.
If you have not heard about Dragonfly - please check it out. It uses - what I hope - novel and interesting ideas backed up by the research from recent years  and . It's meant to fix many problems that exist with Redis today. I have been working on Dragonfly for the last 7 months and it has been one of the more interesting and challenging projects I've ever done!
Anyway, back to cache design. We'll start with a short overview of what LRU cache is and its shortcomings in general, and then analyze Redis implementation specifically.
The least recently used (LRU) cache policy evicts items, as the name implies, that were least recently used. It works this way because a cache algorithm strives to optimize the hit ratio, or the probability that its items will be accessed in the future.
If a cache is full, it needs to vacate items to make room for new additions. The cache frees space for new additions by removing the least valuable items, assuming that the least recently used item is also the least valuable.
The assumption is reasonable, but unfortunately, this algorithm behaves poorly if the assumption above does not hold. Consider, for example, an access pattern with Long tail distribution. Here, the y-axis represents the normalized access frequency of the items in cache and the x-axis represents the items ordered from highest frequency to lowest.
In this case, lots of newly added "yellow" items with low access frequency could push out rare but valuable "green" items responsible for the vast majority of hits. As a result, the LRU policy may sweep its contents together with valuable "green" entities due to traffic fluctuations.
LRU is a simple algorithm that can be implemented efficiently.
Indeed, it maintains all the items in a double-linked list. When an item is accessed, LRU moves it to the head of the list. To vacate LRU items, it pops from the tail of the list. See the diagram above. All operations are done in
O(1), and the memory overhead per item is 2 pointers, i.e. 16 bytes on 64bit architecture.
Redis implements a few eviction policy heuristics. Some of them are described as "approximated LRU". Why approximated? Because Redis does not maintain an exact global order among its items like in classic LRU. Instead, it stores the last access timestamp in each entry.
When it needs to evict an item, Redis performs random sampling of the entire keyspace and selects K candidates. Then it choosed the item with the least recently used timestamp among those K candidates and vacates it. This way, Redis saves 16 bytes per entry needed for ordering items in one global order. This heuristic is a very rough approximation of an LRU.Redis maintainers have recently discussed the possibility of adding an additional heuristic to Redis by implementing a classic LRU policy with global order but eventually decided against it.
Dragonfly implements cache that:
O(1) run-time overhead.
It's a novel approach for cache design has not been suggested before in academic research.
Dragonfly Cache (dash cache) is based on another famous cache policy from 1994 paper - "2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm".
2Q addresses the issues with LRU by introducing two independent buffers. Instead of considering just recency as a factor, 2Q also considers access frequency for each item. It first admits recent items into a so-called probationary buffer (see below). This buffer holds only a little part of the cache space, say less than 10%. All newly added items compete with each other inside this buffer.
Only if a probationary item was accessed at least once, will it be proven as worthy, and upgraded to the protected buffer. By doing so, it evicts the least recently used item from the protected buffer back to probationary buffer. You can read this post for more details.
2Q improves LRU by acknowledging that just because a new item was added to the cache - does not mean it's useful. 2Q requires an item to have been accessed at least once before to be considered as a high quality item. In this way, 2Q cache has been proven to be more robust and to achieve a higher hit rate than LRU policy.
For our purposes today, we need to know the following facts about Dashtable in Dragonfly implementation:
The point at which a segment becomes full is a convenient time to add any type of eviction policy to Dashtable, because this is when Dashtable grows. Moreover, in order to prevent growth of Dashtable, we can only evict items from a full segment. This constitutes a very precise eviction framework that operates in
O(1) run-time complexity.
Dragonfly expands on the ideas above. A naive solution would be to divide hashtable entries into two buffers: a probationary buffer with FIFO ordering, and the protected buffer employing LRU linked-list. That would work but it would require using additional metadata and waste precious memory.
Instead, Dragonfly leverages the unique design of Dashtable and uses its weak ordering characteristics for its advantage.
To explain how 2Q works with Dashtable, we need to explain how we define probationary and protected buffers there, how we promote a probationary item into protected buffer and how we evict items from the cache.
We overlaid the following semantics on the original Dashtable:
(0), and the last slot on the right has the lowest rank
i, it's swapped out with an item at slot
0 stays in the same place.
Basically, Dash-Cache eviction policy is comprised of an "eviction" step described by (2) and the positive reinforcement step described by (3).
That's it. No additional metadata is needed. High quality items will come to reside at high rank slots in their home buckets very quickly, while newly added items compete with each other within stash/probationary buckets. In our implementation each bucket has 14 slots, meaning that each probationary item can be shifted 14 times before being evicted from the cache, unless it proves its usefulness and is promoted. Each segment has 56 regular buckets and 4 stash buckets; therefore we have
6.7% of total space allocated for a probationary buffer. It's enough to catch high quality items before they are evicted.
I hope you enjoyed reading how Dragonfly cache design utilizes seemingly unrelated concepts together to its advantage.