Vector Search Just Got Faster: Dragonfly v1.37 Delivers Up to 7x Throughput Gains and 65x Lower Latency

Vector search has crossed the threshold from emerging capability to production requirement. Whether you're building RAG pipelines, recommendation engines, semantic search, or fraud detection systems, the ability to perform fast, efficient similarity searches over high-dimensional data is no longer a nice-to-have — it's table stakes.

But as vector workloads move from prototypes to production, teams are running headfirst into the same wall: performance at scale. Indexing millions of vectors is one thing. Querying them with low latency, high throughput, and reasonable memory overhead — simultaneously — is another challenge entirely.

With Dragonfly v1.37, we set out to solve exactly that. And rather than just telling you we made it faster, we want to show you.

What changed under the hood

The headline improvement in v1.37 is an architectural redesign of how Dragonfly handles Hierarchical Navigable Small World (HNSW) vector indexes.

In previous versions (v1.35 and earlier), Dragonfly maintained a separate HNSW index per shard. When a search query came in, it executed K-nearest-neighbor searches across multiple independent graphs in parallel, then merged the results. This approach had a clear upside — searching multiple graphs and merging results yielded excellent precision. But it came at a steep cost: high latency, limited throughput, and significant memory overhead from duplicated index structures.

In v1.37, we moved to a single global HNSW index. Each query now searches one unified graph. Vector data isn't copied into the index — instead, the index holds references to the original memory locations of hash objects, which dramatically cuts memory consumption. The result is a fundamentally different performance profile: far higher throughput, far lower latency, and nearly half the memory footprint.

The benchmarks

To quantify the impact, we ran a head-to-head benchmark comparing Dragonfly v1.35, Dragonfly v1.37, and Valkey (with valkey-search) under identical conditions. Here's the setup:

Dataset: gist-960-euclidean — 1 million vectors, 960 dimensions, using L2 (Euclidean) distance. Vectors stored as HASH objects.

Index configuration: HNSW with M=32, EF_CONSTRUCTION=128, uploaded across 16 parallel connections.

Search configuration: 128 parallel query connections, with EF_RUNTIME swept across 64, 128, 256, and 512. Each configuration was run three times and averaged.

Environment: AWS m7g.4xlarge instances, with the benchmark tool running on a separate machine.

We used a modified version of vector-db-benchmark with pipelining enabled during the upload phase.

Throughput and latency: a generational leap

The throughput and latency numbers tell the most compelling story. At every EF_RUNTIME setting, Dragonfly v1.37 delivers dramatically higher queries per second while simultaneously slashing tail latency.

High-throughput configuration (EF_RUNTIME = 64):

Metric	Dragonfly v1.35	Dragonfly v1.37	Valkey
QPS	446	3,136	2,104
P95 Latency	698.50 ms	10.74 ms	47.85 ms
Precision	0.97	0.75	0.75

Dragonfly v1.37 achieves 7x the throughput of v1.35 and 1.5x the throughput of Valkey, while delivering P95 latency that is 65x lower than v1.35 and 4.5x lower than Valkey.

High-precision configuration (EF_RUNTIME = 512):

Metric	Dragonfly v1.35	Dragonfly v1.37	Valkey
QPS	150	1,590	1,393
P95 Latency	3,845.5 ms	82.23 ms	99.30 ms
Precision	0.9975	0.95	0.95

Even when tuned for precision, v1.37 sustains 10.6x more throughput than v1.35 and 47x lower P95 latency. It also edges out Valkey, delivering 14% higher QPS with 17% lower tail latency.

Memory: nearly half the footprint

The global index architecture doesn't just help performance — it fundamentally changes the memory story.

Metric	Dragonfly v1.35	Dragonfly v1.37	Valkey
RSS Memory	8.17 GB	4.57 GB	8.65 GB
Build Time	170 s	173 s	168 s

Dragonfly v1.37 uses 44% less memory than v1.35 and 47% less than Valkey, while maintaining virtually identical index build times. By referencing vector data in place rather than copying it into per-shard index structures, v1.37 eliminates a major source of memory overhead. For teams running large vector indexes in production, this translates directly to lower infrastructure costs.

Precision: predictable and tunable beats unstable and fragile

You'll notice that v1.35 reports higher raw precision numbers in this benchmark, reaching 0.9975 at EF_RUNTIME 512 versus 0.95 for v1.37. But that number deserves context, because it masks a deeper problem.

Inverse indexes like HNSW are fundamentally a poor fit for Dragonfly's original share-nothing, thread-per-shard architecture. When each shard maintains its own independent graph, the same hyperparameters produce different precision behavior depending on how many threads are running. Add more threads and you don't just change throughput. You change the quality characteristics of your search results. In practice, this meant that v1.35's precision was unstable: it would shift as you scaled thread counts, and performance would actually degrade as more threads were added. That's not a trade-off you can engineer around. It's a fundamental mismatch between the index structure and the execution model.

The global index in v1.37 eliminates this instability entirely. Precision is now deterministic for a given set of hyperparameters, regardless of how many threads Dragonfly is running. And it remains fully tunable. You control quality through the knobs that were designed for exactly this purpose: EF_RUNTIME (the size of the dynamic candidate list during search) and M (the maximum number of outgoing connections per node in the graph). Increasing EF_RUNTIME from 64 to 512 moves precision from 0.75 to 0.95 in a smooth, predictable curve, and you can dial in the exact throughput-precision balance your workload requires.

The result is a system that is dramatically more cost-effective, vastly faster, and gives you stable, predictable precision that you can reason about and tune. No more precision that shifts underneath you as your deployment scales.

Where Dragonfly v1.37 stands versus Valkey

Valkey (with valkey-search) serves as a useful reference point, since it also uses a global HNSW index and achieves the same precision profile as Dragonfly v1.37. The differentiation is in raw performance: Dragonfly consistently delivers higher throughput and lower latency across every EF_RUNTIME configuration. At EF_RUNTIME 64, Dragonfly v1.37 pushes 49% more QPS than Valkey with less than a quarter of the P95 latency. It does all of this while using roughly half the memory.

What else shipped in v1.37

The major impact of this release is the improvement to vector search, but v1.37 includes several other improvements worth noting:

Significant memory reduction for JSON documents (#6511) — memory savings extend beyond vector workloads to general-purpose JSON storage.
40% less memory for vector search with hash map documents (#6436, #6437) — the reference-based approach we described above, now reflected in real-world hash storage.
Full support for the SORT command, including BY and GET options.
Several bug fixes around streams (#6506, #6492, #6532).
Added support for DIGEST and DELEX commands (#6328).

For the full list of changes, check out the v1.37.0 release notes on GitHub.

Try it today

Dragonfly v1.37 is available now. If you're running vector search workloads — or planning to — this release represents a step change in what's possible with an in-memory data store.

Download Dragonfly v1.37 from GitHub, spin up a vector index, and see the difference for yourself.

Vector Search Just Got Faster