Scaling Geospatial Queries: Dragonfly’s Inverted GEO Index
Dragonfly natively supports fast geospatial indexing and radius searches. Learn how its R-Tree implementation enables efficient querying of location data at scale.
November 12, 2025

Introduction
Previously, we introduced Dragonfly Search’s inverted indexes for quickly finding documents by keyword, vector, or range fields. We also explored geospatial indexes and how GEO commands can efficiently query geographic data. In this post, we examine how Dragonfly Search is extended to natively support the indexing and querying of geographical data.
What is an R-Tree?
When dealing with geospatial or multi-dimensional data, efficient indexing is crucial for performing fast queries. Traditional data structures like arrays, hash maps, and binary trees are typically not well-suited for geospatial data, particularly when you need to perform range queries (e.g., finding all points within a specific radius) or nearest-neighbor searches. These structures lack the necessary support for efficiently pruning large portions of search space.
This is where spatial data structures like the R-Tree come into play. Designed specifically for multi-dimensional data, R-Trees organize data in a hierarchical manner, optimizing queries such as range searches and k-nearest-neighbor searches in high-dimensional spaces.
B+ Tree is a widely used data structure used for indexing. Both B+ Trees and R-Trees are balanced tree structures, but they are optimized for different types of data and queries.
- B+ Tree: internal nodes store individual keys and the leaf nodes store values. It is great for ordered data but doesn’t handle multi-dimensional data well.
- R-Tree: Instead of individual values, nodes store bounding rectangles (or other shapes) that enclose sets of data. These rectangles represent the spatial extent of the child objects, allowing for efficient spatial queries.
In geospatial queries, this ability to prune search space is crucial. For example, when you query for all points within a certain distance from a given location, an R-Tree quickly eliminates large areas of data that fall outside the query radius, resulting in faster response times.
R-Tree is a versatile data structure, but its performance can vary significantly depending on the splitting strategy used to build the tree. The splitting strategy determines how data is partitioned across the nodes of the tree, which impacts both query speed and insertion/deletion performance.
Here are two of the most commonly used R-Tree splitting strategies:
- Linear Split is one of the simplest and most widely used methods. When nodes in the tree become full, the linear split algorithm divides the set of data points into two groups along the longest dimension, minimizing the overlap between bounding boxes as much as possible.
- Quadratic Split is more complex than the linear approach. It tries to minimize the total area of the bounding boxes by choosing splits that minimize the area overlap between them. The algorithm considers all pairs of objects and selects the one that minimizes overlap.
For more details about the R-Tree data structure, we highly recommend you take a look at this comprehensive and easy-to-follow blog post by Bartosz Sypytkowski, R-Tree: Algorithm for Efficient Indexing of Spatial Data.
Implementing Geographic Index with Boost Geometry
For our geospatial search implementation, we decided to use the Boost Geometry library. This choice was based on its maturity and optimization for geospatial queries. We represent our data as 2D points using geographic traits (latitude, longitude) and pair each point with a unique document ID. Our R-Tree is configured to use a linear split strategy with a maximum of 16 elements per node.
Here’s how we define our data structures in C++:
With this setup, basic operations like insertion and deletion are simple—just pass the relevant index entry to the API. However, implementing an efficient radius-based search was more challenging.
While Boost Geometry provides efficient APIs for spatial indexing, it doesn’t natively support radius search queries out of the box. To implement this functionality, we devised a custom approach:
- Constructing a Temporary Polygon: We first do so around the query point by calculating 4 points at a distance equal to the radius in each of the cardinal directions.
- Bounding Box for Search Space: We then create a minimum bounding box (MBB) around this temporary polygon, which defines our search area.
- Haversine Distance Calculation: For each point within the minimum bounding box, we calculate the Haversine distance (which accounts for the curvature of the Earth) and check if the point lies within the specified radius.

Visualization of a Geospatial Search Query
In the illustration above, the query finds all points in a radius around Mountain View. The temporary polygon around the point has red edges, while the minimum bounding box has gray edges.
Geospatial Search in Action
Now that we’ve implemented geospatial indexing and radius-based search, let’s see it in action in Dragonfly. We start by adding some geospatial data to Dragonfly using the hash data type. Then, we create a search index on these records with TEXT and GEO types for the desired fields, as shown below:
dragonfly$> HSET cities:1 name "Mountain View" location "-122.08, 37.386"
#=> (integer) 2
dragonfly$> HSET cities:2 name "Palo Alto" location "-122.143, 37.444"
#=> (integer) 2
dragonfly$> HSET cities:3 name "San Jose" location "-121.886, 37.338"
#=> (integer) 2
dragonfly$> HSET cities:4 name "San Francisco" location "-122.419, 37.774"
#=> (integer) 2
dragonfly$> FT.CREATE cities_idx ON HASH PREFIX 1 cities: SCHEMA name TEXT location GEO
#=> OKBy convention, geographical data points are formatted as "longitude, latitude" strings. With everything set up, we can now query for all locations within a 20-mile radius of Mountain View.
dragonfly$> FT.SEARCH cities_idx "@location:[-122.08 37.386 20 mi]"
#=> 1) (integer) 3
#=> 2) "cities:1"
#=> 3) 1) "name"
#=> 2) "Mountain View"
#=> 3) "location"
#=> 4) "-122.08, 37.386"
#=> 4) "cities:2"
#=> 5) 1) "name"
#=> 2) "Palo Alto"
#=> 3) "location"
#=> 4) "-122.143, 37.444"
#=> 6) "cities:3"
#=> 7) 1) "name"
#=> 2) "San Jose"
#=> 3) "location"
#=> 4) "-121.886, 37.338"This returns results that include locations such as Palo Alto and San Jose. You can also combine other search fields in a query, for example:
dragonfly$> FT.SEARCH cities_idx "@name:Francisco @location:[-122.08 37.386 50 mi]"
#=> 1) (integer) 1
#=> 2) "cities:4"
#=> 3) 1) "name"
#=> 2) "San Francisco"
#=> 3) "location"
#=> 4) "-122.419, 37.774"This query instead only returns San Francisco, since it matches the name and is within the specified radius.
Performance Matters, As Always
We benchmarked Dragonfly’s geographical index against RediSearch[1] using the geo-bench tool[2]. The tests ran with datasets of 1M, 5M, and 10M documents, using various thread counts. The test configuration and results are shown below.
Client & Server Instances
- Two AWS
c8g.2xlargeinstances, one forgeo_benchand one for Redis/Dragonfly. - Redis and Dragonfly were tested separately.
Server Configurations
Using the configurations below, we tested with NUM_OF_THREADS with 4 and 8.
# Redis
$> ./redis-server --loadmodule /PATH/TO/RediSearch/bin/linux-aarch64-release/search-community/redisearch.so \
--save "" --appendonly no \
--io-threads NUM_OF_THREADS \
--search-workers NUM_OF_THREADS# Dragonfly
$> ./dragonfly --logtostderr --dbfilename= \
--proactor_threads=NUM_OF_THREADS
Dragonfly vs. Redis | Geo Search Query Throughput with NUM_OF_THREADS=8
The detailed benchmark results[3] can be found in the references section below, and they highlight several key advantages of Dragonfly for geospatial search:
- Performance: Dragonfly consistently achieves higher requests per second than Redis across all tested dataset sizes (1M, 5M, and 10M documents). For example, with
NUM_OF_THREADS=8for both engines, Dragonfly processes 3-4x more requests per second. - Scalability: Dragonfly scales efficiently with an increasing number of threads, maintaining robust performance under heavier loads. In contrast, Redis exhibits diminishing returns past a certain concurrency point.
- Geospatial Search: Dragonfly utilizes R-Tree indexing for fast and accurate radius-based queries. A planned future improvement will add the ability to return results sorted by distance.
Final Thoughts
While RediSearch remains a popular choice for geospatial data indexing, Dragonfly emerges as a more performant and scalable alternative in the context of high-volume geospatial search workloads. It leverages techniques like R-Trees and multi-threaded processing to offer more performant and more efficient results. As geospatial data volumes continue to increase, the speed and scalability advantages of Dragonfly make it the preferred solution for mission-critical applications.
References
[1] RediSearch is a Redis module that provides full-text search, secondary indexing, and querying. It is currently bundled with the Redis server by default.
[2] Commands for benchmarking using geo-bench.
Loading documents, where N is the number of documents that are going to be indexed:
$> ./geo-bench load -i documents.json --db redisearch-hash -n NRunning query on 10,000 points:
$> ./geo-bench query -i documents.json --db redisearch-hash -n 10000[3] Detailed benchmark results.
- Results with
NUM_OF_THREADS=4
LOAD | 1M | 5M | 10M |
Redis (req/s) | 76,922 | 81,966 | 81,300 |
Dragonfly (req/s) | 76,922 | 83,332 | 83,333 |
QUERY | 1M | 5M | 10M |
Redis (req/s) | 909 | 500 | 227 |
Dragonfly (req/s) | 2,000 | 1111 | 500 |
Redis - LATENCY AVG | 51.075ms | 96.681ms | 218.525ms |
Dragonfly - LATENCY AVG | 20.632ms | 43.627ms | 95.741ms |
Redis - LATENCY P99 | 123.135ms | 236.031ms | 658.943ms |
Dragonfly - LATENCY P99 | 97.599ms | 202.751ms | 557.055ms |
- Results with
NUM_OF_THREADS=8
LOAD | 1M | 5M | 10M |
Redis (req/s) | 76,919 | 80,644 | 81,301 |
Dragonfly (req/s) | 76,917 | 83,332 | 82,644 |
QUERY | 1M | 5M | 10M |
Redis (req/s) | 1000 | 526 | 238 |
Dragonfly (req/s) | 3,332 | 2,000 | 909 |
Redis - LATENCY AVG | 46.249ms | 89.351ms | 207.980ms |
Dragonfly - LATENCY AVG | 10.592ms | 22.815ms | 50.932ms |
Redis - LATENCY P99 | 121.151ms | 220.159ms | 646.143ms |
Dragonfly - LATENCY P99 | 45.215ms | 104.255ms | 280.063ms |


