Dragonfly

Scaling Geospatial Queries: Dragonfly’s Inverted GEO Index

Dragonfly natively supports fast geospatial indexing and radius searches. Learn how its R-Tree implementation enables efficient querying of location data at scale.

November 12, 2025

Scaling Geospatial Queries: Dragonfly’s Inverted GEO Index | Cover Image

Introduction

Previously, we introduced Dragonfly Search’s inverted indexes for quickly finding documents by keyword, vector, or range fields. We also explored geospatial indexes and how GEO commands can efficiently query geographic data. In this post, we examine how Dragonfly Search is extended to natively support the indexing and querying of geographical data.

What is an R-Tree?

When dealing with geospatial or multi-dimensional data, efficient indexing is crucial for performing fast queries. Traditional data structures like arrays, hash maps, and binary trees are typically not well-suited for geospatial data, particularly when you need to perform range queries (e.g., finding all points within a specific radius) or nearest-neighbor searches. These structures lack the necessary support for efficiently pruning large portions of search space.

This is where spatial data structures like the R-Tree come into play. Designed specifically for multi-dimensional data, R-Trees organize data in a hierarchical manner, optimizing queries such as range searches and k-nearest-neighbor searches in high-dimensional spaces.

B+ Tree is a widely used data structure used for indexing. Both B+ Trees and R-Trees are balanced tree structures, but they are optimized for different types of data and queries.

  • B+ Tree: internal nodes store individual keys and the leaf nodes store values. It is great for ordered data but doesn’t handle multi-dimensional data well.
  • R-Tree: Instead of individual values, nodes store bounding rectangles (or other shapes) that enclose sets of data. These rectangles represent the spatial extent of the child objects, allowing for efficient spatial queries.

In geospatial queries, this ability to prune search space is crucial. For example, when you query for all points within a certain distance from a given location, an R-Tree quickly eliminates large areas of data that fall outside the query radius, resulting in faster response times.

R-Tree is a versatile data structure, but its performance can vary significantly depending on the splitting strategy used to build the tree. The splitting strategy determines how data is partitioned across the nodes of the tree, which impacts both query speed and insertion/deletion performance.

Here are two of the most commonly used R-Tree splitting strategies:

  • Linear Split is one of the simplest and most widely used methods. When nodes in the tree become full, the linear split algorithm divides the set of data points into two groups along the longest dimension, minimizing the overlap between bounding boxes as much as possible.
  • Quadratic Split is more complex than the linear approach. It tries to minimize the total area of the bounding boxes by choosing splits that minimize the area overlap between them. The algorithm considers all pairs of objects and selects the one that minimizes overlap.

For more details about the R-Tree data structure, we highly recommend you take a look at this comprehensive and easy-to-follow blog post by Bartosz Sypytkowski, R-Tree: Algorithm for Efficient Indexing of Spatial Data.

Implementing Geographic Index with Boost Geometry

For our geospatial search implementation, we decided to use the Boost Geometry library. This choice was based on its maturity and optimization for geospatial queries. We represent our data as 2D points using geographic traits (latitude, longitude) and pair each point with a unique document ID. Our R-Tree is configured to use a linear split strategy with a maximum of 16 elements per node.

Here’s how we define our data structures in C++:

With this setup, basic operations like insertion and deletion are simple—just pass the relevant index entry to the API. However, implementing an efficient radius-based search was more challenging.

While Boost Geometry provides efficient APIs for spatial indexing, it doesn’t natively support radius search queries out of the box. To implement this functionality, we devised a custom approach:

  1. Constructing a Temporary Polygon: We first do so around the query point by calculating 4 points at a distance equal to the radius in each of the cardinal directions.
  2. Bounding Box for Search Space: We then create a minimum bounding box (MBB) around this temporary polygon, which defines our search area.
  3. Haversine Distance Calculation: For each point within the minimum bounding box, we calculate the Haversine distance (which accounts for the curvature of the Earth) and check if the point lies within the specified radius.
Visualization of a Geospatial Search Query

Visualization of a Geospatial Search Query

In the illustration above, the query finds all points in a radius around Mountain View. The temporary polygon around the point has red edges, while the minimum bounding box has gray edges.

Geospatial Search in Action

Now that we’ve implemented geospatial indexing and radius-based search, let’s see it in action in Dragonfly. We start by adding some geospatial data to Dragonfly using the hash data type. Then, we create a search index on these records with TEXT and GEO types for the desired fields, as shown below:

dragonfly$> HSET cities:1 name "Mountain View" location "-122.08, 37.386"
#=> (integer) 2

dragonfly$> HSET cities:2 name "Palo Alto" location "-122.143, 37.444"
#=> (integer) 2

dragonfly$> HSET cities:3 name "San Jose" location "-121.886, 37.338"
#=> (integer) 2

dragonfly$> HSET cities:4 name "San Francisco" location "-122.419, 37.774"
#=> (integer) 2

dragonfly$> FT.CREATE cities_idx ON HASH PREFIX 1 cities: SCHEMA name TEXT location GEO
#=> OK

By convention, geographical data points are formatted as "longitude, latitude" strings. With everything set up, we can now query for all locations within a 20-mile radius of Mountain View.

dragonfly$> FT.SEARCH cities_idx "@location:[-122.08 37.386 20 mi]"
#=> 1) (integer) 3
#=> 2) "cities:1"
#=> 3) 1) "name"
#=>    2) "Mountain View"
#=>    3) "location"
#=>    4) "-122.08, 37.386"
#=> 4) "cities:2"
#=> 5) 1) "name"
#=>    2) "Palo Alto"
#=>    3) "location"
#=>    4) "-122.143, 37.444"
#=> 6) "cities:3"
#=> 7) 1) "name"
#=>    2) "San Jose"
#=>    3) "location"
#=>    4) "-121.886, 37.338"

This returns results that include locations such as Palo Alto and San Jose. You can also combine other search fields in a query, for example:

dragonfly$> FT.SEARCH cities_idx "@name:Francisco @location:[-122.08 37.386 50 mi]"
#=> 1) (integer) 1
#=> 2) "cities:4"
#=> 3) 1) "name"
#=>    2) "San Francisco"
#=>    3) "location"
#=>    4) "-122.419, 37.774"

This query instead only returns San Francisco, since it matches the name and is within the specified radius.

Performance Matters, As Always

We benchmarked Dragonfly’s geographical index against RediSearch[1] using the geo-bench tool[2]. The tests ran with datasets of 1M, 5M, and 10M documents, using various thread counts. The test configuration and results are shown below.

Client & Server Instances

  • Two AWS c8g.2xlarge instances, one for geo_bench and one for Redis/Dragonfly.
  • Redis and Dragonfly were tested separately.

Server Configurations

Using the configurations below, we tested with NUM_OF_THREADS with 4 and 8.

# Redis
$> ./redis-server --loadmodule /PATH/TO/RediSearch/bin/linux-aarch64-release/search-community/redisearch.so \
        --save "" --appendonly no \
        --io-threads NUM_OF_THREADS \
        --search-workers NUM_OF_THREADS
# Dragonfly
$> ./dragonfly --logtostderr --dbfilename= \
               --proactor_threads=NUM_OF_THREADS
Dragonfly vs. Redis | Geo Search Query Throughput on AWS c8g.2xlarge

Dragonfly vs. Redis | Geo Search Query Throughput with NUM_OF_THREADS=8

The detailed benchmark results[3] can be found in the references section below, and they highlight several key advantages of Dragonfly for geospatial search:

  • Performance: Dragonfly consistently achieves higher requests per second than Redis across all tested dataset sizes (1M, 5M, and 10M documents). For example, with NUM_OF_THREADS=8 for both engines, Dragonfly processes 3-4x more requests per second.
  • Scalability: Dragonfly scales efficiently with an increasing number of threads, maintaining robust performance under heavier loads. In contrast, Redis exhibits diminishing returns past a certain concurrency point.
  • Geospatial Search: Dragonfly utilizes R-Tree indexing for fast and accurate radius-based queries. A planned future improvement will add the ability to return results sorted by distance.

Final Thoughts

While RediSearch remains a popular choice for geospatial data indexing, Dragonfly emerges as a more performant and scalable alternative in the context of high-volume geospatial search workloads. It leverages techniques like R-Trees and multi-threaded processing to offer more performant and more efficient results. As geospatial data volumes continue to increase, the speed and scalability advantages of Dragonfly make it the preferred solution for mission-critical applications.


References

[1] RediSearch is a Redis module that provides full-text search, secondary indexing, and querying. It is currently bundled with the Redis server by default.

[2] Commands for benchmarking using geo-bench.

Loading documents, where N is the number of documents that are going to be indexed:

$> ./geo-bench load -i documents.json --db redisearch-hash -n N

Running query on 10,000 points:

$> ./geo-bench query -i documents.json --db redisearch-hash -n 10000

[3] Detailed benchmark results.

  • Results with NUM_OF_THREADS=4

LOAD 

1M

5M

10M

Redis (req/s)

76,922

81,966

81,300

Dragonfly (req/s)

76,922

83,332

83,333

QUERY

1M

5M

10M

Redis (req/s)

909

500

227

Dragonfly (req/s)

2,000

1111

500

Redis - LATENCY AVG

51.075ms

96.681ms

218.525ms

Dragonfly - LATENCY AVG

20.632ms

43.627ms

95.741ms

Redis - LATENCY P99

123.135ms

236.031ms

658.943ms

Dragonfly - LATENCY P99

97.599ms

202.751ms

557.055ms

  • Results with NUM_OF_THREADS=8

LOAD 

1M

5M

10M

Redis (req/s)

76,919

80,644

81,301

Dragonfly (req/s)

76,917

83,332

82,644

QUERY

1M

5M

10M

Redis (req/s)

1000

526

238

Dragonfly (req/s)

3,332

2,000

909

Redis - LATENCY AVG

46.249ms

89.351ms

207.980ms

Dragonfly - LATENCY AVG

10.592ms

22.815ms

50.932ms

Redis - LATENCY P99

121.151ms

220.159ms

646.143ms

Dragonfly - LATENCY P99

45.215ms

104.255ms

280.063ms

Dragonfly Wings

Stay up to date on all things Dragonfly

Join our community for unparalleled support and insights

Join

Switch & save up to 80%

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement. Instantly experience up to a 25X boost in performance and 80% reduction in cost