RisingWave + Dragonfly: No More Stale Cache and Outdated Features

RisingWave + Dragonfly: From Streams to Instant Reads

In today’s data-driven world, applications demand real-time responsiveness, whether it’s personalizing user experiences or powering machine learning models with fresh features. But stitching together low-latency data systems often feels like orchestrating a chaotic band: databases struggle with scale, caches become stale, and pipelines grow brittle.

Enter RisingWave and Dragonfly, a powerful duo that harmonizes data streaming with lightning-fast data access, eliminating the traditional tradeoffs between data freshness and performance.

RisingWave is a PostgreSQL wire-protocol-compatible streaming database that ingests, processes, and serves real-time data incrementally. By transforming data from various sources into actionable insights using its SQL dialect, RisingWave modernizes the stream processing experience. And what’s mind-blowing to me particularly is that materialized views in RisingWave don’t need periodic refreshes—they are real-time by default, and this is a feature rarely seen in open-source database systems.
Dragonfly is a drop-in Redis and Valkey replacement that delivers millions of ops/sec on a single server instance with multi-threading and memory efficiency. It is the modern in-memory data store that scales both vertically and horizontally. By keeping your application rhythm tight with sub-millisecond data access even under the heaviest load, valuable data points are available in hand, instantly and unceasingly.

Together, they enable zero-lag user experiences (like profile updates that reflect instantly) along with real-time feature serving for ML models powered by live data streams. By combining RisingWave’s streaming capabilities with Dragonfly’s high-performance data access, you can simplify your architecture, eliminating glue code, stale data, and cache invalidation headaches. In this post, we’ll explore two practical implementations of this powerful pairing: a lightning-fast user profile cache, followed by a real-time online feature store for machine learning.

Setting Up Your Real-Time Stack

To demonstrate how RisingWave and Dragonfly work together, we’ll build a simple but powerful setup. RisingWave supports various data sources, such as messaging platforms like Apache Kafka, Redpanda, Apache Pulsar, AWS Kinesis, and many more. In the meantime, it can also capture data changes from databases like MySQL, PostgreSQL, MongoDB, and so on. For simplicity, we will use PostgreSQL as the source-of-truth, on-disk database for our application data.

In this tutorial, we’ll run all systems in standalone mode, which is ideal for development/testing. For production, consider a high-availability or clustering setup for Dragonfly/PostgreSQL and distributed deployment for RisingWave to achieve fault tolerance and horizontal scalability. Using the following commands, we run PostgreSQL, Dragonfly, and RisingWave in Docker containers, respectively.

docker run -d \
  --name rw-source-postgres \
  -e POSTGRES_USER=local_user_dev \
  -e POSTGRES_PASSWORD=local_pwd_dev \
  -e POSTGRES_DB=appdb \
  -p 5432:5432 \
  -v pgdata:/var/lib/postgresql/data \
  postgres:17 \
  postgres -c wal_level=logical

docker run -d \
  --name rw-sink-dragonfly \
  --ulimit memlock=-1 \
  -p 6380:6379 \
  docker.dragonflydb.io/dragonflydb/dragonfly \
  --proactor_threads=2 \
  --maxmemory=2GB

docker run -d \
  --name risingwave-single-node \
  -p 4566:4566 -p 5691:5691 \
  risingwavelabs/risingwave:latest single_node

A few configurations to pay attention to:

We use wal_level=logical for PostgreSQL so that it supports logical replication. This is an important configuration required by RisingWave for change data capture (CDC).
Meanwhile, I use proactor_threads=2 and maxmemory=2GB for Dragonfly since I’d like to save some resources for my local machine. In reality, Dragonfly can fully utilize multiple cores (64, 96, or more) and hundreds of GB or even TB of memory on a single server machine.

Now that we have everything running, let’s dive into the use cases.

Use Case: Real-Time User Profile Caching

Modern applications often adopt progressive profiling, collecting only essential information like email during signup, then gradually enriching user profiles over time. While this approach improves conversion rates, it can become a bit messy in implementation. This is where the combination of PostgreSQL, RisingWave, and Dragonfly creates an elegant solution. PostgreSQL serves as our source of truth, reliably storing complete user records—notably this is the only place we write data on disk in code. RisingWave then captures every user profile change in real-time through CDC and streams the data to Dragonfly. Dragonfly then maintains an always-fresh in-memory cache of user profiles, serving read requests with sub-millisecond response times.

Let’s start by defining a table schema in PostgreSQL. As mentioned earlier, we leave all optional fields nullable:

-- PostgreSQL
CREATE TABLE users (
    id                  INTEGER PRIMARY KEY,
    email               VARCHAR(200) NOT NULL UNIQUE,
    first_name          VARCHAR(100),
    last_name           VARCHAR(100),
    birthdate           DATE,
    phone_number        VARCHAR(20),
    profile_picture_url TEXT,
    kyc_record_url      TEXT,
    created_at          TIMESTAMPTZ  NOT NULL DEFAULT CURRENT_TIMESTAMP
);

As a real-time streaming database, RisingWave functions as a powerful “data bridge” between systems, not just syncing data between sources and sinks but transforming it in transit. What makes it particularly powerful is that you define everything through SQL: creating sources to ingest data, sinks to output results, and transformations in between. Below is an example of how we can create a data source in RisingWave:

-- RisingWave
CREATE SOURCE pg_source WITH (
    connector='postgres-cdc',
    hostname='host.docker.internal', -- Docker internal network.
    port='5432',
    username='local_user_dev',
    password='local_pwd_dev',
    database.name='appdb',
    schema.name='public'
);

Next, we’ll create a table in RisingWave based on our defined data source. This requires:

A primary key that matches the upstream table.
Explicit specification of the source PostgreSQL table name.

While RisingWave tables don’t always need exact one-to-one mappings with source tables, in this case we specifically want our RisingWave users table to mirror the PostgreSQL source exactly.

-- RisingWave
CREATE TABLE users (
    id                  INTEGER PRIMARY KEY,
    email               TEXT,
    first_name          TEXT,
    last_name           TEXT,
    birthdate           DATE,
    phone_number        TEXT,
    profile_picture_url TEXT,
    kyc_record_url      TEXT,
    created_at          TIMESTAMPTZ
) FROM pg_source TABLE 'public.users'; -- Specify the PostgreSQL server data source and select the correct table.

We now have a mapping table in RisingWave, allowing us to create a data sink to Dragonfly directly from this table without requiring additional aggregation in this scenario. Since Dragonfly is fully compatible with the Redis protocol, we can use RisingWave’s Redis connector to sync data with our Dragonfly server running locally in Docker on port 6380. In terms of data format, we use query string encoding for our user records in Dragonfly, but JSON format is also supported. Additionally, you can always customize the data format to suit your needs.

-- RisingWave
-- Use the Redis connector to talk to Dragonfly, as Dragonfly is fully compatible with Redis.
-- We have Dragonfly running locally in Docker on port 6380 previously.
CREATE SINK user_cache
FROM users WITH (
    connector = 'redis',
    redis.url = 'redis://host.docker.internal:6380/',
    primary_key = 'id'
) FORMAT PLAIN ENCODE TEMPLATE (
    force_append_only = 'true',
    key_format = 'user_cache:{id}',
    value_format = 'email={email}&first={first_name}&last={last_name}&dob={birthdate}&phone={phone_number}&profile_url={profile_picture_url}&kyc_url={kyc_record_url}&created_at={created_at}'
);

That’s it. We’ve essentially just built a simple yet fully functional data pipeline, with PostgreSQL as a data source, RisingWave as the real-time streaming platform, and Dragonfly as a data sink. There’s only one thing missing: the actual data. Let’s try inserting into our PostgreSQL users table to mimic user registration and user profile updates and see what happens.

-- PostgreSQL
INSERT INTO users (id, email) VALUES (1, 'joe@test.com');

Let me emphasize once more: we are inserting data into the PostgreSQL users table—yet, amazingly, we can verify that this user is already cached in Dragonfly after the insertion:

dragonfly$> KEYS *
1) "user_cache:1"

# Key format as defined in the data sink: 'user_cache:{id}'
dragonfly$> GET "user_cache:1"
"email=joe@test.com&first=NULL&last=NULL&dob=NULL&phone=NULL&profile_url=NULL&kyc_url=NULL&created_at=2025-07-15"

Now we have perfectly synced data across PostgreSQL, RisingWave, and Dragonfly, all updated in real-time! Still skeptical? Let’s update the user profile with even richer details.

-- PostgreSQL
UPDATE users
SET first_name   = 'Joe',
    last_name    = 'Test',
    birthdate    = '1989-10-11',
    phone_number = '123-456-7890'
WHERE id = 1;

dragonfly$> GET "user_cache:1"
"email=joe@test.com&first=Joe&last=Test&dob=1989-10-11&phone=123-456-7890&profile_url=NULL&kyc_url=NULL&created_at=2025-07-15"

The user profile is now updated across the board. With just a few SQL commands, we’ve built a self-sustaining pipeline that automatically keeps Dragonfly in perfect sync with PostgreSQL without additional logic in the application code, proving that real-time data doesn’t require complex infrastructure, just the right tools working in harmony.

Use Case: Real-Time Feature Store for ML

In the previous section, we’ve simulated account registration and progressive profiling for our application. Now it’s time to extend our database schema to support products and user orders, showcasing a simple e-commerce setup with user spending pattern capabilities.

Machine learning models need fresh behavioral data to make accurate predictions. Traditional batch pipelines may introduce delays whereby when features reach your model, they’re already outdated. As we’ve discussed in a previous blog post, production ML systems typically use:

Offline storage to store large amounts of historical data for training.
Online storage to provide low-latency features for inference.

In this example, we can consider PostgreSQL as the offline store, which is the source of raw user order records and ground truth. Meanwhile, Dragonfly can serve as the online store, making pre-aggregated features available with sub-millisecond reads. Using RisingWave’s incremental materialized views, we bridge these worlds by transforming raw PostgreSQL data into real-time aggregates and continuously syncing results to Dragonfly.

Let’s start by defining new tables in PostgreSQL and their mappings in RisingWave:

-- New tables in PostgreSQL.
-- In RisingWave, we create mapping tables similar to above.
CREATE TABLE user_orders (
    id               INTEGER PRIMARY KEY,
    user_id          INTEGER        NOT NULL REFERENCES users (id) ON DELETE CASCADE,
    status           VARCHAR(50)    NOT NULL DEFAULT 'PENDING'
        CHECK (status IN ('PENDING', 'PROCESSING', 'SHIPPED', 'DELIVERED', 'CANCELLED', 'REFUNDED')),
    total_amount     DECIMAL(10, 2) NOT NULL,
    shipping_address TEXT           NOT NULL,
    payment_method   VARCHAR(50)    NOT NULL,
    tracking_number  VARCHAR(100),
    created_at       TIMESTAMPTZ             DEFAULT CURRENT_TIMESTAMP,
    updated_at       TIMESTAMPTZ             DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE products (
    id             INTEGER PRIMARY KEY,
    -- Details omitted.
);

CREATE TABLE order_items (
    id         INTEGER PRIMARY KEY,
    -- Details omitted.
);

What differentiates the current use case from the previous is that we now require data aggregation. For feature serving purposes, we track users’ spending behavior both for the current month and the previous two months. To achieve this efficiently, we can create an incremental materialized view in RisingWave. This view automatically maintains and updates the required aggregates in real-time, eliminating the need for manual refreshes or nightly batch jobs:

-- RisingWave
CREATE MATERIALIZED VIEW user_spending_in_recent_months_mv AS
SELECT u.id                             AS user_id,
       u.email,
       EXTRACT(YEAR FROM o.created_at)  AS year,
       EXTRACT(MONTH FROM o.created_at) AS month,
       SUM(o.total_amount)              AS total_spending,
       COUNT(o.id)                      AS order_count
FROM users u
         JOIN
     user_orders o ON u.id = o.user_id
WHERE o.created_at >= DATE_TRUNC('month', CURRENT_TIMESTAMP - INTERVAL '2 months')
  AND o.created_at < DATE_TRUNC('month', CURRENT_TIMESTAMP + INTERVAL '1 month')
  AND o.status NOT IN ('CANCELLED', 'REFUNDED')
GROUP BY u.id, u.email, year, month
ORDER BY u.id, year, month DESC;

After creating the materialized view, we can establish a data sink to Dragonfly just like in our previous implementation. The key difference is that this data sink will now source its data from the materialized view containing aggregated results:

-- RisingWave
-- Use the Redis connector to talk to Dragonfly, as Dragonfly is fully compatible with Redis.
-- We have Dragonfly running locally in Docker on port 6380 previously.
CREATE SINK user_spending_feature
FROM user_spending_in_recent_months_mv WITH (
    connector = 'redis',
    redis.url = 'redis://host.docker.internal:6380/',
    primary_key = 'user_id,year,month'
) FORMAT PLAIN ENCODE TEMPLATE (
    force_append_only = 'true',
    key_format = 'user_spending:{year}:{month}:{user_id}',
    value_format = 'total_spending={total_spending}&order_count={order_count}'
);

From here, we can try inserting a few user order records that happened within the past two months at the time of writing:

-- PostgreSQL
INSERT INTO user_orders (id, user_id, status, total_amount, shipping_address, payment_method, created_at)
VALUES (1, 1, 'DELIVERED', 100, '123 Main St, Anytown, USA', 'CREDIT_CARD', '2025-05-15'),
       (2, 1, 'DELIVERED', 200, '123 Main St, Anytown, USA', 'CREDIT_CARD', '2025-05-17'),
       (3, 1, 'DELIVERED', 300, '123 Main St, Anytown, USA', 'CREDIT_CARD', '2025-05-19');

And the aggregated user spending feature is already synced to Dragonfly in real-time after the insertions:

# Key format as defined in the data sink: 'user_spending:{year}:{month}:{user_id}'
dragonfly$> GET "user_spending:2025:5:1"
"total_spending=600.00&order_count=3"

Note that when we define the materialized view, orders that are CANCELLED or REFUNDED are excluded. We can verify that as well by refunding the $300 order and checking Dragonfly again:

-- PostgreSQL
UPDATE user_orders SET status = 'REFUNDED' WHERE id = 3;

dragonfly$> GET "user_spending:2025:5:1"
"total_spending=300.00&order_count=2"

That specific order is now excluded from the materialized view and consequently from the spending feature in Dragonfly as well. Everything works perfectly.

The Future of Real-Time Data Is Simpler Than You Think

Throughout this tutorial, we’ve transformed what traditionally requires complex infrastructure (real-time data synchronization between databases, caches, and ML systems) into a straightforward SQL-driven workflow. By combining RisingWave’s streaming SQL with Dragonfly’s best-in-class in-memory data performance, we achieved instant user profile caching and real-time feature serving, all with an elegant architecture.

For next steps, I highly recommend recreating the setup in this tutorial yourself to gain hands-on experience with Dragonfly and RisingWave! Both platforms offer fully managed cloud services that make deployment and operations remarkably simple, letting you focus on building your next-gen real-time applications.