Scaling the E-Commerce Brain: How Dragonfly Powers Modern ML Feature Stores
Explore why modern e-commerce AI needs a feature store backbone like Dragonfly for predictable, ultra-low latency and massive throughput.
February 9, 2026

Emerging Challenges for ML Platform Teams
Machine learning has graduated from experimental projects to the operational core of modern applications, especially in data-intensive domains like e-commerce. What was once a handful of models running in production has evolved into a complex, living ecosystem. The models are constantly trained and updated and serve predictions that drive critical business logic.
The central challenge emerges from the substance of these systems: features. We are witnessing a “feature explosion.” Data science teams are continually adding new derived features, experimenting with embeddings, and enriching existing data to improve model accuracy. Each user or entity profile is no longer a few attributes but a high-dimensional vector, a complex state that balloons with every innovation.
For infrastructure engineers, this means scaling a data platform that never sleeps and never stops evolving, just like AI/ML models themselves. Traditional data stores often buckle under this pressure, which turns agility into operational overhead. This leaves us with the critical infrastructure question: How do we build a data layer for the modern ML feature store that is inherently scalable, consistently performant, and operationally simple? The answer lies not in adding more complexity to the stack but in choosing a foundational technology designed for this specific class of problem.
The Feature Explosion & Infrastructure Impact
Having more and more features is a natural evolution for any mature data science team. The challenge, however, is one of infrastructure scale. The question is no longer if new features will be added, but what happens when the team adds ten, then a hundred, then a thousand new fields per entity. What was once a manageable set of user attributes becomes an ever-expanding, high-dimensional state vector. Consider the difference between old-school features and their modern counterparts. Traditionally, an e-commerce user profile might have been defined by simple, static attributes like last_purchase_amount, total_monthly_spend, days_since_signup, etc.
Today, each of these points is a gateway to a cascade of derived, aggregated, and behavioral signals, as the feature explosion is about both quantity and computational weight and context. Let’s look at what modern features actually entail in a contemporary e-commerce platform, for example:
- Customer Lifecycle as a Dynamic Score: It’s no longer just about simple values like days since signup. Modern features are about deriving a user’s lifecycle stage from a composite of signals: account tenure, verification completeness, and the recency of key actions. This requires maintaining multiple, overlapping time windows (e.g., 7-day, 30-day, and 90-day snapshots) to detect trends.
- Spending Behavior as an Intent Signal: We move beyond a simple total. Modern systems track the distribution of order values, the velocity of spending (recent vs. historical), and the unfavorable “days since last order” cliff that signals impending churn. Each of these is a separate, evolving feature.
- Engagement Beyond Vanity Metrics: Instead of counting page views, we model session depth and quality. A cart abandonment isn’t just a lost sale; its rate and context become a powerful conversion predictor. Even passive actions, like adding items to a wishlist, become high-intent signals that must be quantified.
The shift from simple fields to complex feature vectors creates defining challenges for both the offline aggregation and the online serving. An offline store manages the significant task of feature computation and history, whereas the online store is tasked with delivering these expanding features to millions of users with sub-millisecond latency and extreme concurrency. It’s a pure, high-speed retrieval problem. This is where traditional infrastructure hits a wall. The exponential growth in feature complexity is not just about storing more data but also about instant access for every user.
Core Requirements for a Modern Feature Store Backbone
Given the non-negotiable demands of the online serving layer, we can define the core requirements for its backbone with precision. The ideal system must deliver:
- Predictable, Ultra-Low P99 Latency: Consistent sub-millisecond response for feature vector retrieval, regardless of concurrent load.
- Massive Throughput: Handling a flood of simultaneous reads during inference peaks and absorbing high-volume batch writes from the offline store during feature updates.
- Efficient, Native Data Structures: First-class support for rich data structures to store, update, and retrieve multi-field features.
- Simple, Transparent Scalability: The ability to grow capacity both vertically and horizontally, with minimal sharding logic and disruptive rebalancing.
- Durability Without Performance Tax: Strong data safety guarantees that do not compromise the primary goal of low-latency access.
This combination of requirements, critical for real-time e-commerce personalization and fraud detection, eliminates general-purpose databases and strains traditional in-memory solutions for the online feature store. It calls for an engine re-architected for this specific paradigm.
Dragonfly: An Engine for Modern Feature Scale
Dragonfly meets these requirements as a foundation built for modern data intensity. Its architectural advantages align directly with the ML feature store challenge:
- Shared-Nothing, Multi-Threaded Architecture: Unlike single-threaded architectures that bottleneck under concurrent load, Dragonfly’s design parallelizes requests across all CPU cores. This allows it to effortlessly utilize modern hardware with massive feature reads and writes, turning increased demand into higher throughput rather than higher latency.
- Memory Efficiency: Storing millions to billions of features has minimal overhead. Dragonfly’s advanced memory management reduces fragmentation and increases effective data density, allowing more features and vectors per node.
- Flexible Scalability: Dragonfly is designed to scale up on a single, powerful instance first before requiring a cluster. This dramatically simplifies operations for growing teams. You can handle hundreds of gigabytes of feature data in one instance, avoiding the immediate complexity of distributed system coordination.
- Full Redis API Compatibility: Frameworks and applications built using the Redis API would work without modification. Teams can adopt Dragonfly as a drop-in accelerator, modernizing their performance foundation without rewriting logic or altering data models.
In essence, Dragonfly provides the foundational qualities needed for the online feature serving layer: extreme speed at scale, delivered through a simple and operationally sane interface. It transforms the infrastructure challenge from one of constant sharding and tuning into a problem of simply provisioning adequate resources for a predictable, high-performance engine.
Building a Feature Store using Dragonfly
Now, let’s dive into the practical details of how to use Dragonfly to build a feature store, especially a online serving store, with unmatched performance. Its complete Redis API compatibility means you can apply common feature store design patterns directly with familiar Redis commands, as demonstrated below.
Important Commands for Feature Storage
Many teams build their feature store system in-house, utilizing familiar Redis data types and commands. Here are some of the most commonly used ones for building an online feature store, which can be used directly with Dragonfly because of compatibility:
Feature Category | Data Type | Example Commands |
|---|---|---|
Core Feature Vector Storage | Hash |
|
Time-Series & Windowed Features | Sorted Set |
|
Real-Time Feature Ingestion | Stream |
|
Feature Metadata & Lookup | String |
|
As shown in the table above, these native data structures map directly to feature store patterns. Hashes efficiently store a complete set of entity features (e.g., user:1001:features), while sorted sets manage time-windowed events like a user’s last 10 purchases. Streams ingest real-time clicks or transactions for computation, and strings handle simple metadata or singular values.
How Popular Feature Stores Use Dragonfly
Feature stores like Feast abstract these commands, but they rely heavily on hashes for core storage. When you configure Feast to use Dragonfly as its online store, it automatically uses these patterns.
- Storage Pattern: Feast creates a two-level map using hashes as the core online storage in Dragonfly.
- Materialization: Feast writes the latest feature values to Dragonfly using
HSETcommands. - Online Serving: When your model needs a feature vector for inference, Feast’s
get_online_features()method fetches all values in a single, efficientHGETALLorHMGEToperation.
Feast’s ability to use Dragonfly is enabled because it has a built-in Redis online store provider. And again, because Dragonfly is highly compatible with Redis, you just point it to Dragonfly’s endpoint in the feature_store.yaml file. For a detailed Feast/Dragonfly hands-on guide, refer to our previous blog post about Building a Feature Store with Feast, DuckDB, and Dragonfly.
Use Case Preview: Instacart’s ML Feature Store at Scale
To illustrate how these principles and patterns translate to production at e-commerce scale, let’s look at a preview of how Instacart scaled their ML infrastructure. (We will publish a detailed case study in the future).
Instacart initially discovered Dragonfly with a critical scaling challenge. Their legacy in-memory solution could no longer handle the feature explosion within their machine learning platform, struggling under a load of tens of millions of queries per second (QPS).
During the proof-of-concept, an important nuance emerged: this massive QPS was driven primarily by MGET operations, where each single command retrieves multiple keys. This effectively meant the total number of underlying key accesses was 1 to 2 orders of magnitude higher, placing immense concurrent load on the system.
By working closely with the Dragonfly team, Instacart successfully migrated its online feature serving layer. The architecture, built on the principles we’ve discussed, now reliably serves hundreds of millions of features per second, providing the consistent low latency needed for real-time personalization and logistics.
In their own words, the Instacart engineering team shared:
By migrating from ElastiCache to Dragonfly Cloud, we’ve been able to cut our cluster size by 20% while also reducing our average and P99 latencies by 40–50% and further optimizing network costs in ways not possible with ElastiCache Valkey.

Final Thoughts
Building a scalable ML feature store in this feature explosion era, particularly for dynamic e-commerce environments, is ultimately an infrastructure challenge. It demands an online serving layer built for one purpose: predictable, sub-millisecond performance at massive concurrency. The solution is not a single, overburdened database, but a decoupled architecture with a purpose-built engine for speed.
Dragonfly provides that foundation. Its modern design delivers the extreme throughput and low latency needed to serve hundreds of millions of features per second on each instance. With that, let your team focus on innovation, not infrastructure bottlenecks.

