Scaling Real-Time Leaderboards with Dragonfly

Introduction

In today's digital age, leaderboards have become an integral part of many applications, providing a dynamic way to display user scores and rankings. To build gamification features for any application (i.e., games, educational platforms), leaderboards serve as a powerful tool to engage and motivate users. In this blog post, we're going to build a practical and realistic leaderboard system.

Our journey will involve leveraging the capabilities of Dragonfly, a highly efficient drop-in replacement for Redis, known for its ultra-high throughput and multi-threaded shared-nothing architecture. Specifically, we'll be utilizing two of Dragonfly's data types: Sorted-Set and Hash. These data structures are perfect for handling real-time data and ranking systems, making them ideal for our leaderboards.

Moreover, to ensure that our leaderboards are not just real-time but also persistent, we will be integrating a SQL database (PostgreSQL) into our system. This approach allows us to maintain a comprehensive record of user scores over different time frames. As a result, we'll be capable of showcasing three distinct types of leaderboards:

An all-time leaderboard that reflects overall user scores.
A current-week leaderboard that captures the most recent user activities.
Leaderboards for previous weeks, giving users insights into past trends and performances, potentially also providing rewards and prizes for top performers.

Through this implementation, we aim to demonstrate how Dragonfly, in conjunction with traditional SQL databases, can be utilized to create robust, scalable, and efficient leaderboard systems. So, let's dive in and start building!

Implementation

1. Database Schema

In the implementation of our leaderboard system, a carefully designed SQL database schema plays a pivotal role. At the core of this schema is the users table, which is essential for storing basic user information. This table includes fields like id (a unique identifier for each user, automatically incremented as BIGSERIAL), email (a unique field to prevent duplicate registrations), password, username, and timestamps created_at and updated_at to track the creation and last update of each user record. Note that the password field should store the hashed or encrypted version of the user's password for security purposes.

CREATE TABLE IF NOT EXISTS users
(
    id         BIGSERIAL PRIMARY KEY,
    email      VARCHAR(255) UNIQUE NOT NULL,
    password   VARCHAR(255)        NOT NULL,
    username   VARCHAR(255)        NOT NULL DEFAULT '',
    created_at TIMESTAMPTZ         NOT NULL DEFAULT NOW(),
    updated_at TIMESTAMPTZ         NOT NULL DEFAULT NOW()
);

Next, we have the user_score_transactions table, which logs all score transactions for users. It consists of an id as a unique transaction identifier, user_id linking to the users table, score_added representing the score change, reason for the score change (such as winning a game or completing a task), and a created_at timestamp for the transaction record.

CREATE TABLE IF NOT EXISTS user_score_transactions
(
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT       NOT NULL REFERENCES users (id),
    score_added INT          NOT NULL,
    reason      VARCHAR(255) NOT NULL,
    created_at  TIMESTAMPTZ  NOT NULL DEFAULT NOW()
);

Finally, the user_total_scores table is dedicated to maintaining the cumulative scores of each user. It contains an id for each record, user_id to reference the users table, total_score indicating the user's overall score, and an updated_at timestamp for the last score update.

CREATE TABLE IF NOT EXISTS user_total_scores
(
    id          BIGSERIAL PRIMARY KEY,
    user_id     BIGINT      NOT NULL REFERENCES users (id),
    total_score INT         NOT NULL DEFAULT 0,
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

This schema is particularly effective due to its emphasis on normalization, which reduces redundancy by segregating user information, score transactions, and total scores into distinct tables. It ensures scalability with the use of BIGSERIAL and BIGINT data types, accommodating a large volume of records. Additionally, the separate user_score_transactions table offers valuable insights into the score history for each user, which is beneficial for analytics and audit trails. We will also create materialized views to further support leaderboards for previous weeks as we will see later. By isolating the total scores in the user_total_scores table, the system can swiftly access and update a user's total score, enhancing performance. This well-structured schema thus forms the backbone of our leaderboard system, supporting both real-time updates and a comprehensive score history.

2. Dragonfly Keys & Data Types

With the database schema in place, we can now focus on the Dragonfly key-value pairs that will be used to store the leaderboard data. The Sorted-Set data type is ideal for storing user scores and rankings, while the Hash data type is perfect for storing user information that is needed for display purposes. And the classic String data type is useful for caching the results of materialized views for previous weeks' leaderboards. Here are the keys and data types that we will be using:

leaderboard:user_scores:all_time (Sorted-Set): Stores the user IDs and scores for the all-time leaderboard.
leaderboard:user_scores:week_of_{monday_of_the_week} (Sorted-Set): Stores the user IDs and scores for the current-week leaderboard.
leaderboard:users:{user_id} (Hash): Stores the minimal information for a specific user to be displayed on the leaderboard.
leaderboard:cache_top_100:week_of_{monday_of_the_week} (String): Caches the top 100 users for a specific previous week from the corresponding materialized view in the database.

An example of the key space would look like this:

dragonfly$> KEYS leaderboard:*
1) "leaderboard:user_scores:all_time"           # Sorted-Set
2) "leaderboard:user_scores:week_of_2024_01_15" # Sorted-Set
3) "leaderboard:users:1"                        # Hash
4) "leaderboard:users:2"                        # Hash
5) "leaderboard:users:3"                        # Hash
6) ...

3. All-Time & Current-Week Leaderboards

In the implementation of the all-time leaderboard and current-week leaderboard, we focus on how scores are updated for a user and how the top 100 users are queried from these leaderboards.

To update scores, we first record the score transaction in the user_score_transactions table and then update the user_total_scores table. This operation should be wrapped in a database transaction to ensure data integrity.

BEGIN;

-- Record score transaction for user with ID 1.
INSERT INTO user_score_transactions (user_id, score_added, reason)
VALUES (1, 100, 'WINNING_A_GAME');

-- Update total score for user with ID 1.
UPDATE user_total_scores
SET total_score = total_score + 100,
    updated_at  = NOW()
WHERE user_id = 1;

COMMIT;

Next, we update the all-time leaderboard and current-week leaderboard in Dragonfly. Note that the operations are better pipelined to reduce the number of round-trips between the application and Dragonfly.

dragonfly$> ZINCRBY leaderboard:user_scores:all_time 100 1
dragonfly$> ZINCRBY leaderboard:user_scores:week_of_2024_01_15 100 1

Now that we have persisted with the score change in the database and updated the values in Dragonfly as well, when querying the top 100 users from a leaderboard (all-time or current-week), we can simply use the ZREVRANGE command to retrieve the top users from the Sorted-Set, and then use the HGETALL commands to retrieve user details from the Hash keys.

dragonfly$> ZREVRANGE leaderboard:user_scores:all_time 0 99 WITHSCORES
 1) "1"    # user_id = 1
 2) "1000" # score for user_id = 1
 3) "2"    # user_id = 2
 4) "900"  # score for user_id = 2
 5) "3"
 6) "800"
 7) "4"
 8) "700"
 9) "5"
10) "600"
# ...

dragonfly$> HGETALL leaderboard:users:1
dragonfly$> HGETALL leaderboard:users:2
dragonfly$> HGETALL leaderboard:users:3
dragonfly$> HGETALL leaderboard:users:4
dragonfly$> HGETALL leaderboard:users:5
# ...

Depending on how many users are recorded in the leaderboard:user_scores:all_time key, we need to use 1 ZREVRANGE command and potentially 100 HGETALL commands to retrieve the top users. This may sound like a lot of commands, but once again, we can pipeline these commands to reduce the number of round-trips between the application and Dragonfly. In fact, the top user scores with their details can be retrieved in a single round-trip, and the response time should still be within a few milliseconds. On the other hand, we completely avoid the need to query the database for the top users, which is a much more expensive operation. This is why we are confident in saying that Dragonfly is providing a real-time experience for leaderboard retrieval.

4. Leaderboards for Previous Weeks

For the implementation of leaderboards for previous weeks, we adopted a strategy that efficiently balances database querying with caching. The process involves two main steps: creating materialized views and leveraging Dragonfly's caching capabilities.

We utilize the user_score_transactions table to generate materialized views for each past week's leaderboard. Materialized views are essentially snapshots of the query results, stored for efficient access. These views are created by aggregating the scores from the user_score_transactions table for each user over a specific week. An example SQL statement to create a materialized view for a specific week might look like this:

CREATE MATERIALIZED VIEW leaderboard_week_of_2024_01_15 AS
SELECT u.id, u.username, u.email, sum(ust.score_added) AS weekly_score
FROM user_score_transactions ust
         JOIN users u ON ust.user_id = u.id
WHERE ust.created_at BETWEEN '2024-01-15 00:00:00' AND '2024-01-21 23:59:59'
GROUP BY u.id
ORDER BY weekly_score DESC;

Once the materialized view for a week's leaderboard is created, we can cache its results in Dragonfly to facilitate quick retrieval. We utilize Dragonfly's String data type to store the serialized form of the leaderboard, which can be in JSON, XML, or any other format. The reason is that past leaderboards cannot be changed anymore, and the order is preserved in the materialized view, so we can simply cache the results as-is.

SELECT * FROM leaderboard_week_of_2024_01_15 LIMIT 100;

dragonfly$> SET leaderboard:cache_top_100:week_of_2024_01_15 'serialized_leaderboard_data'

Other Considerations

1. Calculating the Start of the Week

For the weekly leaderboards, it's essential to have a consistent method to determine the start of each week, commonly set as Monday. This calculation is vital because it impacts both the naming conventions of keys in Dragonfly and the logic for creating and refreshing materialized views in the database. Implementing helper methods in the application code that accurately calculate the Monday of any given week is necessary. This consistency ensures that both the database views and the Dragonfly keys are synchronized in terms of the time periods they represent. Such an implementation in Go might look like this:

// MondayOfTime returns the Monday of the week of the given time.
func MondayOfTime(ts time.Time) time.Time {
	tt := ts.UTC()
	weekday := tt.Weekday()
	if weekday == time.Monday {
		return tt.Truncate(24 * time.Hour)
	}
	daysToSubtract := (weekday - time.Monday + 7) % 7
	return tt.AddDate(0, 0, -int(daysToSubtract)).Truncate(24 * time.Hour)
}

// MondayOfTimeStr returns the Monday of the week of the given time in string format.
func MondayOfTimeStr(ts time.Time) string {
	return MondayOfTime(ts).Format("2006_01_02")
}

2. Management of Dragonfly Keys

The all-time leaderboard data, represented by a Sorted-Set key in Dragonfly, is a long-term data set that can be kept indefinitely. This key does not require an expiration as it continuously accumulates user scores over time.

Conversely, the current-week Sorted-Set key in Dragonfly should be managed with an expiration policy. Setting an expiry time point for this key, preferably at the beginning of the next week, ensures that the data does not become stale and reflects only the current week's scores. This practice helps in maintaining the relevance and accuracy of the current-week leaderboard. Conveniently, with the MondayOfTime function above being used properly in code, when we enter a new week, a new Sorted-Set key would be created in Dragonfly automatically upon the first score update, as ZINCRBY creates a Sorted-Set key if it does not exist. This would serve as a fresh start for the current week.

And finally, the user-detail Hash keys in Dragonfly, shared across all-time and current-week leaderboards, can also be kept indefinitely. However, it's crucial to keep the data in these user-detail Hash keys up-to-date with the corresponding records in the database. Whenever a user's details change in the database, these changes should be promptly reflected in the Hash keys in Dragonfly. This synchronization ensures that the leaderboards always display the most current and accurate user information.

3. Key Naming Conventions

It's important to adopt a clear and distinct naming convention for different types of data stored in Dragonfly. Specifically, the key names for the current-week Sorted-Set and the cached materialized view (String data type) should be different to prevent confusion. A clear naming strategy helps avoid accidental operations on the wrong Dragonfly data type.

Conclusion

In this blog, we explored how Dragonfly can be used in conjunction with a SQL database to build a robust and efficient leaderboard system for gaming and other applications. We discussed various data types and techniques that can be easily utilized to create real-time leaderboards with minimal update and retrieval latency.

We have a recorded workshop session, "Scaling Real-Time Leaderboards", that you can watch here. Code snippets in this blog post can be found in the Dragonfly examples repository. Finally, we encourage you to try Dragonfly out for yourself, experience its capabilities firsthand, and build amazing applications with it!