Dragonfly as a Multi-Purpose Data Store for AI Applications

LlamaIndex is a popular framework for building agentic AI applications that tap domain-specific knowledge beyond general AI knowledge. You can take your data (files, documents, etc.) and turn them into something an LLM can understand, and then query it using natural language. While we’ll use LlamaIndex as our main example, one of the underlying challenges it addresses is universal across many frameworks and applications: managing diverse data for AI.

Depending on the kind of AI application you’re building, you’ll need to keep track of many different kinds of data over time. For example:

Chat history so conversations feel continuous
Documents that represent the source data you ingested
Indexes and metadata that let you reload, update, and reason about that data efficiently
Vectors for similarity search and retrieval-augmented generation (RAG)

Because these needs are different, you might end up using multiple databases, for example, a database for chat memory, a key-value store for state and metadata, and a different vector database for embeddings, to name a few. LlamaIndex reflects this reality in its design. Instead of assuming a single storage backend, it exposes different storage interfaces: chat stores, document stores, index stores, key-value stores, and vector stores. And for many of these, Redis is already a supported backend.

In this blog, we’ll show how Dragonfly can serve many of these storage needs, and it does so much better than Redis or Valkey. And in the meantime, since Dragonfly is Redis-compatible, you can use it wherever LlamaIndex expects Redis, without changing your application code. We’ll walk through each storage type, explain what it’s used for, and show how a Dragonfly data store fits into a LlamaIndex-based AI application.

Why Dragonfly Works So Well with LlamaIndex

100% Redis Protocol Compatibility

Dragonfly implements the Redis protocol, which means that from the point of view of LlamaIndex, there is no difference. Dragonfly works out of the box. All the Redis-backed stores in LlamaIndex, including chat store, document store, index store, key-value store, and vector store, are built on top of standard Redis clients, so there’s no Dragonfly-specific plugin or fork needed.

You just point LlamaIndex at a Dragonfly endpoint instead of a Redis one. This is important because it keeps your application code stable. You can switch the backend without rewriting your storage logic or touching LlamaIndex internals.

Lower Memory Usage & Extreme Throughput

While the above reason is just the flexibility Dragonfly offers you when switching from Redis/Valkey in your AI application, the actual reason you’d want to switch in the first place is much better performance and more reliable operations. AI applications put a lot of pressure on their storage layer. You’re dealing with:

Frequent small reads and writes (chat messages)
Large values (documents and embeddings)
Many concurrent requests during ingestion and querying

Dragonfly is engineered from the ground up to handle the most demanding, high-throughput in-memory workloads, exactly the kind generated by AI-powered, wide-context applications. Real-world benchmarks against Redis and Valkey reveal significant gains, and here are just a few examples of them:

Up to 25x higher RPS compared to Redis on AWS c6gn.16xlarge
Up to 4.5x higher GET/SET RPS and 29x ZADD RPS compared to Valkey on GCP C4 instance with 48 vCPUs
Significantly lower memory usage and cost for enterprise-grade workloads

The performance gains are because of Dragonfly’s multi-threaded shared-nothing architecture. Unlike Redis, which processes commands on a single core, Dragonfly can use all available CPU cores for parallel command handling. This allows it to fully utilize modern machines and keep performance stable as concurrency increases, which is especially important as your application scales. In the meantime, data structures used in Dragonfly are highly optimized to reduce memory usage for various workloads.

Another key differentiator comes when you think about scaling. With Redis, scaling often means running many smaller instances and sharding workloads across them. Because Redis is single-threaded, it can’t effectively take advantage of large cloud instances with many cores, even though those machines often come with better and more predictable network performance.

Dragonfly removes this limitation. You can scale vertically by running on larger instances and actually using all the cores, or choose to scale horizontally by adding more nodes when needed. This flexibility lets you choose the scaling model that fits your workload and budget, instead of being forced into one approach. We covered the real-world implications of this in one of our recent articles if you’d like to learn more about it.

Now let’s take a look at the different data stores LlamaIndex offers and where Dragonfly comes into play.

Chat Store

A lot of AI applications today are conversational. Whether you’re building a chatbot, an agent, or a tool-using assistant, you need a way to store chat history so the conversation can continue across multiple turns.

In LlamaIndex, this is handled by a Chat Store. Its job is to store chat messages per user or per session, append new messages as they arrive, retrieve the full conversation when the LLM needs context, and delete messages when a session ends or expires. The defining characteristic of chat history is its strict reliance on order. Unlike other data types where timestamps provide context, the sequence of messages is fundamental to conversation. That means the underlying store needs to support ordered inserts, reads, and deletes efficiently, all keyed by a conversation identifier.

This is where Dragonfly fits naturally. Chat history maps cleanly to list-like data structures, where messages are appended in order and read back as a sequence. Dragonfly supports the same Redis list semantics that LlamaIndex expects while handling high write rates well, which is important for streaming chats and multi-user systems.

Using Dragonfly with LlamaIndex doesn’t require any special integration. You simply run Dragonfly and point the URL in your code to your Dragonfly instance. Here’s what the code for this looks like:

from llama_index.storage.chat_store.redis import RedisChatStore
from llama_index.core.memory import ChatMemoryBuffer

chat_store = RedisChatStore(redis_url="redis://localhost:6379", ttl=300)

chat_memory = ChatMemoryBuffer.from_defaults(
    token_limit=3000,
    chat_store=chat_store,
    chat_store_key="user1",
)

Underneath, LlamaIndex uses familiar commands from the list family, such as RPOP, to manage chat message records:

# The `delete_last_message` method of the `RedisChatStore` class uses the RPOP command.
class RedisChatStore(BaseChatStore):
    # ...

    def delete_last_message(self, key: str) -> Optional[ChatMessage]:
        """Delete last message for a key."""
        return self._redis_client.rpop(key)

    # ...

To try this setup yourself, you can run Dragonfly locally during development and in production using Dragonfly Cloud or by self-hosting it.

Document Store

When you ingest data into LlamaIndex, your original documents are split into smaller chunks called node objects (read more). These nodes are what get embedded, indexed, and retrieved later during queries. And the Document Store is where LlamaIndex saves these nodes. It acts as the source of truth for your content, allowing you to reload indexes, rerun queries, or update embeddings without having to re-ingest the original files every time. This is an important component as your datasets grow or when you’re iterating on index and retrieval logic. Using Dragonfly here is straightforward as well, as shown in the code below:

from llama_index.storage.docstore.redis import RedisDocumentStore
from llama_index.core.node_parser import SentenceSplitter

parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)

docstore = RedisDocumentStore.from_host_and_port(
    host="127.0.0.1", port="6379", namespace="llama_index"
)
docstore.add_documents(nodes)

This connects to a Dragonfly instance and stores your nodes under a namespace, typically organized as {namespace}/docs. When your application restarts, you can reconnect to the same Dragonfly instance using the same host, port, and namespace, and LlamaIndex will load the existing documents instead of re-ingesting them.

Under the hood, a document store is essentially a key-value store. Each node is stored under a key, with its text and metadata serialized as the value. This makes it a natural fit for Dragonfly, which is optimized for high-throughput key-value workloads. In practice, this means faster retrievals or reloads of the original documents during development and production compared to Redis, especially when you’re frequently rebuilding indexes or experimenting with different model configurations.

Index Store

An index in LlamaIndex is a logical structure that defines how your data is organized, stored, and queried (read more). You can think of it as basically a connection between your raw content (nodes), your embeddings (vectors), the storage backends, and the retrieval and query logic. When LlamaIndex builds an index, it creates lightweight index metadata that tracks internal state, relationships between nodes, and other information needed to reconstruct or reload the index from storage.

This metadata is stored in the Index Store. Compared to documents or embeddings, the data here is relatively small, but it’s accessed often, especially when loading an existing index, updating it, or running queries. Because of this access pattern, the index store benefits from a backend that can handle frequent reads and writes with low latency. Dragonfly excels here since it’s optimized for high-concurrency workloads and fast metadata access, even when accessed by multiple application components simultaneously.

As with the other stores, LlamaIndex already provides a Redis-backed implementation, so using Dragonfly requires no special setup:

from llama_index.storage.index_store.redis import RedisIndexStore
from llama_index.core import VectorStoreIndex

index_store = RedisIndexStore.from_host_and_port(
    host="127.0.0.1", port="6379", namespace="llama_index"
)

storage_context = StorageContext.from_defaults(index_store=index_store)
index = VectorStoreIndex(nodes, storage_context=storage_context)

As demonstrated in the code snippet above, we used VectorStoreIndex to build an index from an existing vector store. This highlights a key point: a vector index is a specialized type of index, fundamentally organized for similarity search. This brings us to the next category of data that AI applications must manage: the vector store itself, a role for which Dragonfly is equally well-suited.

Vector Store

Vector stores are a core part of retrieval-augmented generation (RAG). If you’re building anything with LlamaIndex, you will almost certainly be using a vector store. A vector store is where LlamaIndex keeps the embedding vectors derived from the nodes in the Document Store. These vectors are used to run similarity search, which allows LlamaIndex to find the most relevant pieces of context for a given query. That retrieved context is then passed to the LLM, forming the basis of RAG.

Basically, the vector store is responsible for storing embeddings, running nearest-neighbor searches, and returning the top matching results. Because this happens on nearly every query, performance and latency matter a lot. Dragonfly fits well here too because it supports Redis-compatible vector search. This means you can store vectors and their associated metadata in the same system you’re already using for documents, index state, and chat history. For many applications, this removes the need to run a separate vector database just for embeddings.

Using Dragonfly as a vector store in LlamaIndex looks exactly like using Redis. You create a Redis client, wrap it with RedisVectorStore, and pass it into the storage context:

from redis import Redis
from llama_index.core import StorageContext
from llama_index.vector_stores.redis import RedisVectorStore

redis_client = Redis.from_url("redis://localhost:6379")
vector_store = RedisVectorStore(redis_client=redis_client, overwrite=True)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

Once this is set up, LlamaIndex will automatically store embeddings in Dragonfly and use them for similarity search during queries. From the application’s point of view, nothing changes, but you as a developer get a simpler architecture with one backend handling both vectors and metadata.

Dragonfly as a Multi-Purpose Backend Store for AI Applications

LlamaIndex already has support for Redis across its different storage layers. That makes Dragonfly a natural fit, but more importantly, it makes Dragonfly a better choice for AI workloads.

When you use Dragonfly instead of Redis with LlamaIndex, you keep the same APIs and client libraries, but you get improvements where AI applications need them most: higher throughput for ingestion and querying, lower latency under concurrency, and much more efficient use of modern multi-core machines. A typical LlamaIndex setup stores chat history, document nodes, index state, metadata, and embeddings, all of which generate frequent reads and writes. Dragonfly handles this mixture of workloads better than Redis, without forcing you to shard across many small instances or introduce additional databases.

The result is a simpler architecture: one database that supports multiple LlamaIndex stores, scales vertically and horizontally based on your needs, and performs well as your AI application grows. You spend less time tuning infrastructure around Redis limitations and more time building and improving your actual AI application. If you’re already using Redis with LlamaIndex, trying Dragonfly and seeing the performance gains yourself is easy. Just start a Dragonfly instance, point your existing Redis URLs to it, and that’s it. You can use the code samples in this post to help you get started with all the different LlamaIndex stores you might be using in your application.