Dragonfly Swarm 2TB Cluster Hits 10M+ RPS Easily, Nears 20M with Pipelining
We benchmarked a 2TB Dragonfly Swarm data store, achieving over 10M RPS and nearing 20M with pipelining. See the full performance breakdown and cost savings.
September 30, 2025

The In-Memory Data Beast: Terabyte-Scale Dragonfly Cloud Data Stores
The era of massive in-memory workloads is no longer on the horizon. It’s already here. From caching to lightning-fast ML feature serving and AI model inference, applications now need far more context than in the past to power the personalized experiences, which requires fetching significantly larger amounts of data at unprecedented speed.
Dragonfly’s secret weapon has always been its unmatched performance and scalability, both vertically and horizontally, with the groundbreaking multi-threaded architecture (check out our previous benchmarks #1, #2, and #3). In Dragonfly Cloud, a standalone instance can scale vertically to handle up to 600GB of data and over 1 million requests per second (RPS), dramatically simplifying operations compared to a clustered Redis or Valkey deployment that is required to support similar workloads.
For the most demanding applications that need to go beyond a standalone instance, Dragonfly Swarm provides a seamless path to horizontal scaling. To demonstrate this, we pushed beyond the limits of a single instance, provisioning a 2TB Dragonfly Swarm cluster and putting it to the test. The results confirmed the capability: the cluster easily handled over 10 million RPS, and with reasonable pipelining, performance soared close to 20 million RPS.
Throughput (RPS) | P50 Latency (ms) | P99.9 Latency (ms) | |
---|---|---|---|
Individual Commands | 10,164,385.24 | 1.20 (per 1 command) | 7.94 (per 1 command) |
Pipelining (Batch of 5) | 19,459,528.87 | 3.11 (per 5 commands) | 12.37 (per 5 commands) |
This isn’t a comparison with other solutions. It’s a glimpse into the future of high-performance data infrastructure. While the P99.9 latencies might appear high at first glance, it’s crucial to remember this metric represents the extreme tail end of performance under a massive load of over 10 million RPS. And in the case of pipelining, the 12.37ms is the latency for a full batch of 5 commands, not just one. Stay with me for more details in this post.
Beyond a Single Server
When you need to provision a data store larger than 600GB in Dragonfly Cloud, you select the "Dragonfly Swarm Multi-Shard" option. This data store type allows you to scale memory capacity all the way up to 15TB directly from the console. This is our solution for horizontal scaling, but it’s built on a key design philosophy that sets it apart: we scale up before we scale out.

Instead of immediately forcing you into the complexity of horizontal sharding, Dragonfly first leverages the power of a modern multi-core server. In fact, we frequently see users replace entire Redis Cluster deployments with a single, more powerful Dragonfly instance. This is why we only recommend a Dragonfly Swarm when you truly exceed the generous limits of a single node. This "scale up first, then scale out if needed" approach minimizes operational complexity and data movement, delivering more performance with fewer moving parts than a traditional, prematurely sharded cluster.
In the meantime, you get this scalability without compromise or application rewrites, thanks to full Redis and Valkey API compatibility. Use the same commands and client libraries, but now with near-unlimited headroom.
Benchmarking: Putting Dragonfly Swarm to the Test
Now for the moment of truth! We set up a rigorous test to push our 2TB Dragonfly Swarm cluster and measure the real-world performance developers can expect.
Benchmark Setup: Client and Server Configuration
To fairly test the system, we ensured the clients were not the limiting factor. Here’s how the setup looks like:
- Server (Dragonfly Swarm): A 2TB Dragonfly Swarm data store on the Enhanced compute tier, provisioned in minutes through the Dragonfly Cloud console. For this test, we disabled TLS and persistence to measure pure in-memory performance. The cluster, by default, consisted of 10 shards (nodes).
- Clients (Load Generators): We launched 4 powerful AWS
c7gn.8xlarge
instances in the same region and availability zone as our Dragonfly data store, connecting via minimal-latency VPC peering. This powerful client fleet ensured we could generate enough load to push and measure the server. For benchmark results, we calculated the combined throughputs and average latencies of all 4 client instances.
Benchmark Tool & Configurations
We used memtier_benchmark
, a convenient tool for benchmarking key-value data stores, with the following configurations:
$> memtier_benchmark --server=hostname.dragonflydb.cloud --port=6379 --cluster-mode \
--ratio=1:1 --hide-histogram --threads=40 --clients=5 --requests=500000 \
--distinct-client-seed --data-size=256 --expiry-range=500-500
Let’s break down the key parameters:
--cluster-mode
: This is crucial. It tellsmemtier_benchmark
to discover the cluster’s topology and distribute the load correctly.--threads=40
: Number of threads on the client side. We ran extensive tests with threads of[8, 16, 24, ...64]
and found out that at 40 threads, the throughput reached the peak.--clients=5
: Number of clients per thread on the client side. However, it is notable that in cluster mode, this means 5 connections per thread to each shard/node of Dragonfly Swarm.--ratio=1:1
: The ratio ofSET:GET
commands. We used a balanced mix of 50%SET
and 50%GET
commands.--data-size=256
: Data size in bytes. We used 256-byte values, a common and realistic payload size.
Note that since the --clients
parameter behaves differently for standalone and cluster modes, let’s also see an example of connections spawned during a benchmark run. So, at --threads=40
, the total connections are:
40 threads/instance × 5 connections/thread/shard x 10 shards x 4 c7gn.8xlarge instances = 8,000 total connections
Individual Commands: 10 Million RPS and Rock-Solid Latency
The numbers speak for themselves. At the configuration of 40 threads per client instance, the cluster delivered staggering performance:
- Throughput → 10,164,385.24 RPS
- Average Latency → 1.51 ms
- P50 Latency → 1.20 ms
- P99 Latency → 5.74 ms
- P99.9 Latency → 7.94 ms

Benchmark Results | Dragonfly Cloud Swarm 2TB Enhanced | No Pipelining
Achieving over 10 million RPS is remarkable indeed. Even more critically, the latency remained exceptionally low and stable with a P50 latency of 1.51 ms. This demonstrates that Dragonfly Swarm isn’t just performant in terms of throughput but also consistently responsive under extreme load, proving its enterprise-grade stability.
Pushing Further: The Power of Practical Pipelining
In real-world applications, it’s often possible to logically group commands together. To simulate this common optimization, we re-ran the benchmark with a pipeline batch size of five (i.e., --pipeline=5
), which is a reasonable assumption for scenarios where a few operations can be batched to reduce network overhead. The results were even more impressive. By reducing the number of network round trips, pipelining allowed the cluster to achieve higher throughputs.
- Throughput → 19,459,528.87 RPS
- Average Latency → 3.64 ms
- P50 Latency → 3.11 ms
- P99 Latency → 9.65 ms
- P99.9 Latency → 12.37 ms

Benchmark Results | Dragonfly Cloud Swarm 2TB Enhanced | Pipelining Batch Size = 5
This highlights the massive efficiency gains possible with reasonable pipelining and batching, which is a common technique used in practice.
More Than Just Performance
Raw performance is only one part of the story. What truly matters in production is how that performance translates into real-world reliability, cost, and simplicity.
A Note on Real-World Configuration
While our benchmark pushed for maximum throughput, it’s important to consider practical configurations. As we increased the number of client connections (based on the --threads
and --clients
options for memtier_benchmark
) beyond the optimal point, we observed a slight decrease in overall throughput. This is an expected trade-off, as connections (along with features like TLS encryption and persistence) do consume resources. Dragonfly supports a high number of connections, but like any high-performance system, optimal results come from wisely configuring client libraries and connection pools to match your specific workload patterns.
Operational Simplicity: It Just Worked
The most striking part of this entire exercise was how little operational heavy lifting was required. Provisioning the 2TB Dragonfly Swarm data store took less than five minutes through the Dragonfly Cloud console. For an experienced DevOps or platform engineer, reproducing our exact benchmark environment (including the four AWS c7gn.8xlarge
instances and VPC peering) would be a straightforward task achievable within an hour or two. Dragonfly Cloud handled all the complexity of clustering, health checks, and failover, allowing us to focus entirely on the test.
Significantly Higher Per-Dollar Value
When you model the cost of achieving similar throughput and capacity on other managed services, the value of Dragonfly Cloud becomes undeniable. Our pricing is primarily based on provisioned memory capacity, delivering a much higher performance-per-dollar ratio. More importantly, the cost savings are compounded by the operational burden we remove. You are paying for the data store and offloading the entire operational overhead of clustering, monitoring, and version upgrades onto our team.
These significant savings weren’t just theoretical. And I’ll admit, I was curious too. So, I attempted to provision a comparable 2TB in-memory data store targeting 10M+ RPS on a competing platform. The numbers I saw were genuinely surprising. Or, given what we’ve seen, maybe they weren’t surprising at all?
Platform | Specification | Monthly Price |
---|---|---|
Dragonfly Cloud | Swarm (Multi-Shard); 2TB; Enhanced; Non-HA | $22,000 |
Redis Cloud | Pro; 2000GB; 10,000,000 ops/sec; Non-HA | $115,632 |
Note: Pricing is based on the AWS us-east-1
region (N. Virginia) as of September 2025. This comparison becomes even more favorable for Dragonfly when you consider its memory efficiency, which often reduces required capacity by ~20% for the same dataset.
Built for the Most Demanding Workloads
The 2TB cluster we tested is just the beginning. Dragonfly Cloud currently offers data stores up to 15TB directly through the console. But what if your needs are even bigger? For the most demanding in-memory workloads, our team can create custom Dragonfly Swarm data stores exceeding 15TB.
For organizations requiring the ultimate in performance, security, and dedicated support, we offer Dragonfly Cloud Enterprise, which includes features like Bring Your Own Cloud (BYOC) for maximum control and compliance. This benchmark demonstrates the core engine power of Dragonfly Swarm, while our Enterprise tier provides the full, managed chassis around the supercar.
Conclusion: The New Benchmark for Scale
Pushing a terabyte-scale Dragonfly Swarm cluster felt less like a stress test and more like a demonstration of what modern data infrastructure should be: extremely powerful, remarkably simple, and truly cost-effective. Achieving over 10 million RPS on a fully managed service proves that the bottlenecks of legacy in-memory systems are a problem of the past.
Don’t just take our word for it. Spin up your Dragonfly Cloud data stores today (well, maybe start with something a little smaller than 2TB) and see how it transforms your next-generation application’s performance.