Feature Stores: Architecture and Online/Offline Storage

Feature Stores: Flexibility, Performance, and Use Cases

In the world of machine learning (ML), feature stores have emerged as a critical component for managing, serving, and reusing features across different models and teams. Simply put, a feature store is a centralized repository that standardizes the storage, retrieval, and sharing of features—the measurable properties or inputs used by ML models. Using a feature store helps maintain consistency between training and inference, reduce redundant computation, and enable collaboration across teams.

Popular Feature Store Options

Organizations today can select from multiple robust feature store solutions, both open-source and proprietary. Among open-source options, Feast has gained wide adoption for its flexible support of multiple storage backends (like ScyllaDB and Dragonfly), while Feathr specializes in low-latency serving at scale, and Hopsworks offers an end-to-end ML platform with integrated feature store capabilities. Cloud providers like AWS and GCP also provide managed solutions through SageMaker and Vertex AI, respectively. While all these options are powerful, Feast’s strong community support and platform-agnostic nature make it appealing for avoiding vendor lock-in and adapting to diverse infrastructure needs.

Feature Store Use Cases

Feature stores unlock the ability to build real-time ML applications efficiently. For example, they can power real-time bidding (RTB) systems in AdTech and fraud detection systems in FinTech or enable personalized recommendations by serving fresh features at scale.

A while back, we explored running Feast with Dragonfly—now we’re diving deeper into feature store fundamentals as they’ve become increasingly critical for production ML systems. In this blog, we’ll focus on the general concepts of feature stores—how they work, their benefits, and architectural considerations. In a follow-up post, we’ll dive into a realistic example of building an RTB system using a feature store.

Feature Store Architecture & Components

To understand how a feature store works, let’s break down the key components using Feast as an example. While a complete Feast deployment has quite a few components, they all play important roles within the system. We will go over them below while specifically shedding light on the online and offline stores, which serve distinct but complementary responsibilities, much like a primary database and a caching layer in traditional backend systems.

Components of Feast

Feast consists of several components that work together to form a complete management system. These are the key pieces of the Feast architecture:

Feast Registry: A centralized metadata store (backed by S3, GCS, etc.) that keeps track of feature definitions. Enables discovery and versioning of features via the Feast SDK.
Feast Python SDK/CLI: The primary interface for defining and managing features, loading (materializing) features into the online store, building training datasets from the offline store, and retrieving features for real-time serving.
Feature Server: A horizontally scalable REST API that serves low-latency feature values to models during inference.
Others, like stream processor for data ingestion, batch materialization engine for loading data from offline stores into online stores, and authorization manager, are also crucial components.

Feast Architecture & Components

Offline and Online Stores: The Dual-Layer Design

Besides those comprehensive functional components above, at the heart of any feature store lies its storage architecture, and this is where the dual-layer design of offline and online stores becomes critical. The separation isn’t arbitrary—it’s driven by fundamentally different requirements across the ML lifecycle.

The Offline Store in Feast (as well as in other feature stores) provides a standardized interface for working with historical time-series feature data across various data sources. Implemented through different OfflineStore backends, each connecting to specific storage systems (typically large-scale data warehouses or data lakes), it serves two critical functions in ML pipelines:

Creating training datasets from time-series features.
Materializing features into an online store for real-time serving.

The capability of offline stores becomes particularly valuable in production scenarios like training fraud detection systems that require years of transaction history or when backfilling features after logic changes, such as recalculating user metrics across months of historical data while maintaining consistency.

In contrast, the Online Store specializes in delivering the latest feature values with millisecond latency for real-time inference. Built on performant in-memory data stores like Dragonfly or NoSQL databases like ScyllaDB, it’s optimized for rapid point lookups and only maintains the latest feature values (no historical values), such as a user’s current session activity.

This architecture becomes critical in scenarios like real-time fraud scoring during checkout, where even a 500ms delay (on top of other process delays) could significantly degrade payment experience, or when serving personalized recommendations that demand up-to-the-second user preferences.

Why Feature Stores Evolved with Offline/Online Separation

Similar to traditional backend systems, you can think of the offline store as the primary database (complete history, optimized for analytics) and the online store as a cache (sub-millisecond access for live applications). The dual-layer architecture evolved to resolve fundamental tensions in production ML systems.

Offline Store for Training, Online Store for Serving

Training requires reproducible historical features with time-travel capabilities (scanning terabytes across months), while serving demands sub-10ms access to the latest values—workloads so divergent that single-database solutions inevitably caused training-serving skew, unsustainable costs, or both. By adopting a pattern similar to backend systems’ database/cache separation, the offline store (optimized for large-scale processing) and online store (designed for low-latency lookups) together ensure consistent features across the ML lifecycle.

Evaluating Offline & Online Store Options

Feast’s pluggable architecture supports multiple storage backends, allowing teams to choose solutions that match their scale, cost, and performance requirements. Below, we analyze why certain databases are well-suited as offline and online stores.

BigQuery as a Serverless Distributed Offline Store

BigQuery, Google’s serverless data warehouse, excels at petabyte-scale analytics with SQL support. Its separation of storage/compute enables efficient feature materialization via Feast’s BigQuerySource, while partitioning optimizes time-range queries. Ideal for large historical datasets.

DuckDB as a Lightweight Embedded Offline Store

As an emerging embedded OLAP database, DuckDB can process local or remote files (Parquet/CSV/JSON) with minimal overhead. With DuckDB’s low overhead and the FileSource from the Feast SDK, this combination makes a frictionless choice for small, moderate, and even large-scale feature data. However, DuckDB by itself lacks the distributed capabilities like BigQuery for extremely large workloads.

Dragonfly: The Perfect In-Memory Online Store Choice

Dragonfly stands out as an exceptional online store choice for feature serving, combining groundbreaking performance with seamless Redis compatibility. It delivers dramatic improvements over traditional in-memory data stores through its innovative multi-threaded, shared-nothing architecture and advanced memory optimization techniques. What makes Dragonfly particularly compelling is its ability to handle the most demanding workloads, supporting millions of ops/sec per instance while maintaining atomic operations for reliable feature updates. This performance profile makes it ideal for real-time applications like fraud detection or recommendation systems where low latency is critical.

ScyllaDB: Extremely Fast On-Disk Online Store

ScyllaDB presents a compelling choice as an online store, despite being an on-disk distributed NoSQL database, thanks to its exceptional balance of performance and robustness. What makes it exceptionally suitable for real-time feature serving is its ability to deliver consistent single-digit millisecond read latencies. This performance holds steady even under heavy demands—a critical requirement for applications like fraud detection systems that can’t compromise on speed.

Here’s a simplified comparison table of the stores we just analyzed:

Store	Type	Strengths	Considerations	Best For
BigQuery	Offline	Large Scale, Serverless	Pricing (Compute, Storage, Network)	Enterprise historical feature storage.
DuckDB	Offline	Embedded, Lightweight	No Distributed Processing by Default	Local or remote moderate scale feature storage.
Dragonfly	Online	Extreme Throughput	Memory-Bound	Real-time sub-millisecond serving.
ScyllaDB	Online	Large Scale, Consistent Low-Latency	Operational Effort	Real-time serving without memory bound.

Feature Stores: From Theory to Practice

In this post, we explored feature store architecture using Feast as our example, breaking down its core components and specifically offline and online stores that power batch training and real-time serving. We also examined why databases like BigQuery, DuckDB, ScyllaDB, and Dragonfly are well-suited for different storage layers, balancing scale, latency, and operational efficiency.

But understanding the theory is just the beginning. In a follow-up post, we’ll roll up our sleeves and deploy a full Feast setup, complete with realistic use cases, showing exactly how to bridge batch and real-time ML workflows in production. Stay tuned for hands-on implementation details!