Read Replicas Done Right
Learn why read replicas are often unnecessary with Dragonfly’s architecture. And when they still make sense sometimes, they’re now enabled by default for Dragonfly Swarm.
July 30, 2025

TL;DR
- Read replicas are a relic of an outdated architecture.
- With Dragonfly, you don’t need them, since we scale up and out.
- Used wisely, they can be a powerful lever for efficiency, availability, and scale—if you understand when and how to apply them.
Read Replicas: The Good, The Bad, and The Ugly
High-throughput applications often face the challenge of scaling data stores to handle massive volumes of read requests while maintaining performance.
To address this, many organizations deploy read replicas, secondary nodes that maintain a real-time copy of the data from a primary node and handle read-only traffic. This helps distribute read load, reduce latency, and improve user experience.
Imagine a feature store application handling millions of simultaneous requests for feature values per second. Directing all these read operations to a single primary node could lead to bottlenecks, slowing down the application and frustrating end users. Read replicas are designed to mitigate this by offloading these read operations from the primary node.
While the concept is straightforward—add replicas when the primary is under load and route reads accordingly—the reality is more complex. Replication introduces trade-offs: eventual consistency, increased operational complexity, and higher infrastructure costs. Before deploying read replicas, it’s important to weigh these factors carefully.
Why Using Read Replicas Is a Bad Idea in Many Cases
Using read replicas is a compromise, not a solution. It was born as a workaround for architectural limitations, specifically, the single-threaded nature of Redis and Valkey. When vertical scaling isn’t an option, even modest workloads can force you to scale out, introducing unnecessary complexity and overhead.
Here’s why read replicas can be bad for you:
- Eventual Consistency: Redis replication is asynchronous. This means replicas are always behind the primary, sometimes by milliseconds, sometimes more. If you’re writing and immediately reading from a replica, you’re gambling on consistency. In distributed systems terms, you’re opting into stale reads by default. If your app needs up-to-date data, don’t use a read replica.
- Fighting Fire with Gasoline: The whole point of adding a replica is to offload a stressed master. But replication adds replication overhead on every single write—serializing and streaming updates to all replicas. So if your workload is write-heavy, guess what? You’re not offloading much at all. You’re just moving pieces around while the pressure stays the same.
- Wasteful by Design: You’re duplicating the entire dataset in memory again and again just to squeeze out a few more reads. That’s a whole lot of cloud bills for a whole lot of redundancy.
- Operational Spaghetti: Failovers, consistency issues, read routing, zombie replicas… it’s a whole genre of headaches that can increase maintenance overhead and reduce system reliability.
- False Sense of Redundancy: If your replicas are under-provisioned (e.g., with enough memory for data but less CPU than the primary to save costs), a primary instance failure can leave the rest of the instances struggling under peak traffic, potentially causing a full system outage. What looks like a safety net can quickly become a bottleneck and risk the stability of your entire infrastructure.
If Redis had a more scalable architecture from the start, the need for read replicas as a stopgap solution would not exist.
Why You Don’t Need Read Replicas in Dragonfly
Dragonfly was built to solve the scaling problems that force teams to reach for read replicas in the first place.
Unlike Redis’ use of a single-threaded event loop, Dragonfly runs a multi-threaded engine that fully utilizes modern multicore CPUs. This means you can scale throughput by simply adding more cores. And in most real-world scenarios, that alone is enough to avoid needing replicas entirely.
We’ve seen customers handle millions of operations per second on a single node, with consistent read and write latencies in the low microseconds. No sharding. No lag. No stale reads. And no memory wasted on redundant copies of data.
Dragonfly scales vertically first. And when you reach the limits of a single machine, you can move seamlessly to Dragonfly Swarm, our native distributed multi-shard architecture.
So while Dragonfly makes read replicas unnecessary, we still support them for the rare cases where they’re truly the best design choice.
Read Replicas: Sometimes You Need Them
Deploying read replicas isn’t our default approach to scaling reads. And with Dragonfly’s architecture, they usually aren’t needed. But we support them for cases where they’re the right tool. When used deliberately, read replicas can be a useful part of a well-architected system. Here are a few examples:
Hardware Shouldn’t Sit Idle
We see many customers deploying replicas for high availability (HA) reasons. And in those cases, the replica just sits there, quietly holding a copy of memory, waiting for the master to trip. It’s powered on, eating budget, and spending most of its life with idle CPUs.
If only it could do something instead of just waiting? Well, it can.
When it makes sense—when strong consistency isn’t a hard requirement—your replica can handle real read traffic, making use of those precious CPU cycles and RAM you’re already paying for. No waste.
Smarter Efficiency Across Availability Zones
Here’s another real-world optimization, known as zone affinity: if you’ve got a replica in a different availability zone (AZ) for HA, you can route local reads to it, which reduces cross-AZ traffic for cost savings and improves client latency by keeping reads local.
Again, it’s not about scaling. It’s about making smart use of the infrastructure you already have.
Lots of Client Connections
Sometimes it’s not about raw throughput—it’s about connection count. If you have tens of thousands of clients connecting to the same node, you’ll experience high tail latency and eventually exhaust the node. Even highly optimized infrastructure like NGINX and Envoy starts to show strain when connection counts get really high. In those cases, splitting traffic across a few read replicas can reduce pressure on the primary, improve stability, and make life easier for your networking stack.
When Vertical Scaling Hits the Ceiling
We love vertical scaling. Dragonfly was built for it. But let’s be real: even cloud hardware has its limits. You’re not going to push 10 million ops/sec through a single node forever. At some point, the biggest machine you can rent still won’t be big enough.
At that point, you better shard your data or start using a read replica or do both to keep scaling.
Conclusion
In most cases, relying on read replicas is a patch, not a solution. Vertical scaling and multi-shard deployments (hello, Swarm!) are almost always better answers. But yes, there are edge cases where a read replica is just useful and sometimes also necessary. When paired with HA, it can even become an efficiency multiplier.
So don’t follow the crowd. Think before you deploy read replicas. Or better yet, talk to us! We’re happy to help you make the right call.
Read replicas are now enabled by default in Dragonfly Swarm. You can learn how to configure the clients and utilize read replicas by following our documentation. Use them. Abuse them. And as always, tell us what you love, what you hate, and what you broke.
If you would like to set up a custom POC to see if read replicas are a good fit for your workload and to benchmark Dragonfly’s performance vs. your current solution, get in touch here.