Memory-Aware Rebalancing Is Now Automatic in Dragonfly Cloud

At scale, distributed clusters don't stay balanced. They start that way, but over time — as access patterns shift, data shapes change, and workloads grow unevenly — memory distribution across shards drifts. The cluster looks healthy in aggregate. But one shard is sitting at 90% utilization while others are at 40%. And that hot shard is where your performance problems start.

This is what Instacart was dealing with on their Dragonfly Cloud cluster. Individual shards hitting 100% memory utilization while others were significantly underutilized. The options were bad: over-provision the entire cluster to give the hot shard room, or accept the risk of evictions and degraded performance. Neither addresses the actual problem, which is that hash slots are not distributed evenly across shards.

Today we are shipping shard memory balancing for Dragonfly Cloud — automatic redistribution of hash slots across shards based on actual per-slot memory measurement, with configurable windows to control when rebalancing runs

Why shard imbalance happens

In a distributed in-memory store, keys are grouped into hash slots (Redis cluster mode uses 16,384 of them), and those slots are assigned to shards. The slot assignment is deterministic, but the memory footprint of each slot is not. Some slots hold large values. Some hold keys that never expire while neighboring slots turn over constantly. Access patterns concentrate writes on specific keyspaces. Over time, the memory weight of slots diverges significantly across shards — and your cluster's utilization graph starts to look like a skyline rather than a flat line.

For small clusters, this is a manageable inconvenience. For large, memory-bound clusters — the ones where you have sized infrastructure intentionally and every gigabyte matters — it becomes a real operational problem. The hot shard is your bottleneck. You either hit its ceiling and start seeing evictions, or you resize the whole cluster to accommodate what is fundamentally a distribution problem.

What shard memory balancing does

Most approaches to rebalancing — including ElastiCache's — operate at the slot-count level: they try to spread the 16,384 slots evenly by number across shards. That doesn't help when the problem is memory weight, not slot count. A shard with 2,000 small slots can be fine while a shard with 1,800 large slots is at 100% utilization.

Dragonfly Cloud takes a different approach. We track memory utilization at the individual slot level — we know exactly how much memory lives in each of the 16,384 slots, not just the per-shard aggregate. When shards diverge meaningfully in memory utilization, the balancer identifies the heaviest slots on the overloaded shards and moves those slot ranges to shards with available headroom. The result is rebalancing that responds to actual memory distribution rather than a count of slots.

Operators can configure rebalancing windows — specific days and hours during which rebalancing is permitted to run. For latency-sensitive workloads, this means you can restrict rebalancing to off-peak hours without having to monitor or trigger it manually. The cluster queues up any needed rebalancing and executes it when the window opens.

For comparison: ElastiCache redistributes hash slots evenly by count, per AWS's own documentation, without accounting for the memory weight of individual slots. Valkey offers no automatic rebalancing at all — it requires manual intervention via valkey-cli --cluster rebalance.

Why this matters in practice

Per-slot memory tracking is what makes memory-aware rebalancing possible in the first place. Without knowing how much memory sits in each slot, you cannot make informed decisions about which slots to move — you are just shuffling slots around hoping the memory follows. Dragonfly's granular measurement means the balancer moves the right slots, not just slots.

The downstream impact is straightforward. You stop over-provisioning to cover for hotspots. You use the memory you are paying for. You stop watching one shard's utilization creep toward its ceiling while capacity sits idle elsewhere. And when traffic spikes or data shapes shift, the cluster self-corrects within the next rebalancing window rather than requiring an operator response.

For teams running large, memory-bound Dragonfly Cloud clusters, this means better cost efficiency, more predictable performance, and one less operational problem to manage manually.

How to get started

Shard memory balancing is available now for Dragonfly Cloud customers on large clusters. Existing customers can reach out tot the Dragonfly team to have it turned on and new customers can request a demo here.

Memory-Aware Rebalancing Is Now Automatic in Dragonfly Cloud

Stay up to date on all things Dragonfly

Switch & save up to 80%