Scaling Real-Time Financial Data Infrastructure: A Modern Security Master Blueprint

Why Your Legacy Security Master Can’t Keep Up

Security master, a centralized repository for all financial instrument data that serves as the golden source of truth widely developed and used within financial institutions, was primarily a back-office system focused on reliability and end-of-day accuracy. However, its role has shifted significantly over the years. The demands of modern trading and risk management have turned the security master into a critical, real-time component within the system, exposing some fundamental limitations of legacy infrastructure.

Latency: Decades ago, not too much would change in a second. Today, a single second contains entire market moves. For high-frequency trading (HFT), pre-trade checks, and real-time risk, the security master’s lookup latency translates directly to lost profitability or unmanaged exposure. Every millisecond matters.
Exploding Data Volume: A modern security master must keep track of tens of millions of records (i.e., stocks, cryptocurrencies, intricate derivatives, ETFs, and structured products), each with hundreds of constantly changing attributes. These datasets now range from hundreds of gigabytes to several terabytes.
Concurrency: The security master is no longer just a back-office batch system. It’s a real-time API, handling hundreds of thousands, even millions, of simultaneous requests from traders, algorithms, and risk engines. Legacy systems simply can’t handle this volume, leading to unpredictable delays and service outages.

The requirement of low latency, massive scale, and extreme concurrency has rendered yesterday’s solutions insufficient, creating a critical infrastructure gap. To move forward, we must first rethink the core architecture of the security master itself.

Architectural Principles for a Modern Security Master

Faced with these demands, the blueprint for a modern security master shifts from a simple data store to a platform built on core architectural pillars. These principles are agnostic of any specific technology, but they define the capabilities required to bridge the infrastructure gap. A viable solution must excel in three key dimensions:

Vertical and Horizontal Scalability: The architecture must efficiently utilize all resources on a single server (vertical scaling), fully leveraging modern multi-core CPUs, while also supporting the seamless addition of nodes (horizontal scaling). This is essential for managing unpredictable load and massive dataset growth cost-effectively. In the meantime, it should be easy to deploy and manage both on-premises and in the cloud.
High Throughput Under Massive Load: The system must guarantee high, consistent throughput, sustaining at least millions of operations per second, even under tens of thousands of concurrent clients.
Efficiency with Complex and Large Datasets: The data model must be inherently efficient at scale, storing tens of millions of securities, each with tens to hundreds of dynamic attributes. The architecture must provide rapid access to this nested data with minimal overhead, directly impacting performance and cost even as data volumes grow 10x or more.

Building a Security Master: Dragonfly as the Core Engine

Having established the core architectural principles for a modern security master, we now turn to implementation. This is where a performant in-memory data store like Dragonfly could be a compelling option. It sticks to the principles of both vertical and horizontal scalability and resource efficiency while introducing capabilities that enhance a key-value data store with a powerful, unified search experience.

A Modern Data Store for Modern Data

Dragonfly is engineered from the ground up as a high-performance, multi-threaded in-memory data store designed for modern hardware. It offers full API compatibility with Redis, ensuring a straightforward migration path for existing applications. Its critical architectural advantage is the elimination of Redis’s single-threaded command execution bottleneck, allowing it to fully utilize multi-core CPUs. Beyond a single server, Dragonfly scales horizontally as well with Dragonfly Swarm. With native JSON support and integrated secondary indexing (tag, numeric, text, and vector), Dragonfly consolidates diverse access patterns into a single, high-performance data layer. If you want to follow along, the full code, including the detailed schema and all query examples from this post, is available in our public examples repository.

Modeling the Security Instrument

Let’s consider a security instrument modeled as a JSON document. The schema is designed for rich queryability, with key fields indexed for performance. For our examples, we’ll assume a document structure containing:

Core Identifiers: security_id (the internal ID within our system); external IDs such as ISIN or CUSIP; ticker, which is the symbol; and other fields such as security_name.
Categorical Data: sector (e.g., “Technology”), exchange (e.g., “NASDAQ”), and others.
Numerical Data: last_price, dividend_yield, and others.
Descriptive Text: general_description containing business overviews.
Vector Representation: security_general_description_embedding, a dense vector capturing the semantic meaning of the description.

security = {
    "security_id": "NGV8TER28ZB8VDED",
    "ticker": "HTY",
    "isin": "CHDEFF4EN9JG05",
    "cusip": None,
    "sedol": None,
    "bloomberg_ticker": "HTY CH Equity",
    "figi": "BBGGCFJCW93X",
    "reuters_ric": "HTY.S",
    "security_description": {
        "security_name": "Mitchell and Sons",
        "asset_class": "EQUITY",
        "instrument_type": "OPTION",
        "issuer_name": "Mitchell and Sons",
        "country_of_incorporation": "CH",
        "exchange": "SIX",
        "currency": "CHF",
        "sector": "Industrials",
        "industry_group": "Aerospace & Defense",
        "market_segment": "Large Cap",
        "general_description": "A CHF-denominated option of Mitchell and Sons trading on SIX in the industrials sector."
    },
    "security_general_description_embedding": [0.1, 0.2, 0.3, 0.4, 0.5],
    "pricing_valuation": {
        "last_price": 266.51993064447095,
        "pricing_source": "Exchange",
        "valuation_type": "Mark-to-Market",
        "price_timestamp": "2026-01-12T14:04:28.723905Z",
        "fifty_two_week_range": "158.25 – 370.91"
    },
    "instrument_details": {
        # ...
    },
    "corporate_actions": {
        # ...
    },
    "regulatory_compliance": {
        # ...
    },
    "operational_metadata": {
        # ...
    }
}

While the raw data contains dozens of fields, the query engine’s power comes from how we index them. Below is a simple but representative index schema, mapping key document paths to specific index types for optimal retrieval. The TagField indexes are used for exact-match filters (like sector or exchange), the NumericField for range queries on price, and the VectorField enables semantic search.

from __future__ import annotations

from redis.commands.search.field import TextField, TagField, NumericField, VectorField
from sentence_transformers import SentenceTransformer

_transformer_model = SentenceTransformer("all-MiniLM-L6-v2")
_transformer_model_dim = 384

schema = [
    TagField("$.ticker", as_name="ticker"),
    TagField("$.isin", as_name="isin"),
    TagField("$.security_description.exchange", as_name="exchange"),
    TagField("$.security_description.currency", as_name="currency"),
    TagField("$.security_description.sector", as_name="sector"),
    TextField("$.security_description.security_name", as_name="security_name"),
    NumericField("$.pricing_valuation.last_price", as_name="last_price"),
    NumericField("$.instrument_details.dividend_yield", as_name="dividend_yield"),
    VectorField(
        "$.security_general_description_embedding",
        as_name="security_general_description_embedding",
        algorithm="FLAT",
        attributes={
            'TYPE': 'FLOAT32',
            'DIM': _transformer_model_dim,  # e.g., 384 dimensions for the all-MiniLM-L6-v2 model
            'DISTANCE_METRIC': 'COSINE'
        }
    ),
]

Use Case 1: Precise Lookups & Filters

For portfolio screening and real-time trading systems, one of the most fundamental operations is filtering securities based on exact, multi-dimensional criteria. Imagine a portfolio manager who must instantly identify the top 10 dividend-yielding stocks in the “Technology” sector, traded on “NASDAQ”, with a yield above 0.40%. In a legacy setup, this might require multiple (or JOIN) queries and client-side filtering, introducing latency and complexity. Our defined schema makes this efficient. By creating tag indexes on fields like sector and exchange, and a numeric index on dividend_yield, Dragonfly can resolve this as a single, high-speed query.

dragonfly$> FT.SEARCH idx:securities "@sector:{Technology} @exchange:{NASDAQ} @dividend_yield:[0.004 +inf]" SORTBY dividend_yield DESC LIMIT 0 10

This query demonstrates the consolidation of precise filtering logic into the data layer itself. The performance is predictable and scales with multiple CPU cores, ensuring speed remains consistent even as the dataset grows or concurrent request volume spikes. Note that you can also ask Dragonfly to return necessary fields only by using the RETURN clause:

dragonfly$> FT.SEARCH idx:securities "@sector:{Technology} @exchange:{NASDAQ} @dividend_yield:[0.004 +inf]" RETURN 3 '$.ticker' '$.security_description' '$.instrument_details' SORTBY dividend_yield DESC LIMIT 0 10

In general, when processing a search query, Dragonfly leverages its multi-threaded architecture to evaluate index conditions across all threads in parallel and collect the results before returning to the client, as illustrated below.

Dragonfly Search Query Execution

Use Case 2: Textual Search on Security Names

A trader or analyst often needs to find a security when they only recall a fragment of its name. For instance, searching for all funds or notes containing “Miller” in their title, such as “Miller Value Partners” or “Howard/Miller Funds.” A system requiring an exact, full-name match would fail this everyday task. Our schema’s text index on the security_name field enables these powerful infix and partial matches directly within the data layer. By using wildcard syntax (*), we can perform prefix, suffix, and infix searches.

dragonfly$> FT.SEARCH idx:securities "@security_name:*Miller*" RETURN 2 '$.ticker' '$.security_description' LIMIT 0 10

The query searches through the indexed security names, looking for any instance of “Miller.” It then returns the ticker and the complete description for the first 10 matches. This illustrates how the security master functions as more than just a lookup table but also an interactive interface for exploration. And it does all these while keeping the millisecond-level response times to ensure a smooth user experience.

Use Case 3: Semantic Search for Conceptual Discovery

One advanced challenge is finding securities based on conceptual similarity, not keywords. A risk manager might want to identify instruments with a risk profile akin to “high-volatility tech startup bonds,” a concept unlikely to appear verbatim in any security name or description. This requires understanding the semantic meaning of the text. This is where the vector index in our schema unlocks a new paradigm. By generating an embedding using a model for the security_general_description field and storing it in the security_general_description_embedding vector field, we encode semantic information. The k-nearest neighbors (KNN) query allows us to search this vector space directly.

import numpy as np
from redis.commands.search.query import Query
from sentence_transformers import SentenceTransformer

from dragonfly import connect_dragonfly

_transformer_model = SentenceTransformer("all-MiniLM-L6-v2")
_dragonfly = connect_dragonfly()


def vector_search(query: str):
    query_vec = _transformer_model.encode(query).astype(np.float32).tolist()
    query_expr = (Query("*=>[KNN 10 @security_general_description_embedding $query_vector AS vector_score]").
                  return_fields("security_id", "security_description", "vector_score").
                  sort_by("vector_score").
                  paging(0, 10))
    params = {"query_vector": np.array(query_vec).astype(dtype=np.float32).tobytes()}
    results = _df.ft("idx:securities").search(query_expr, query_params=params).docs
    return results

In the Python code snippet above, we encode the query text the same way as how the security description was encoded, then sends it as a k-nearest neighbors (KNN) query to Dragonfly. This query finds the top securities whose description embeddings are closest in the vector space to the query concept, measured by cosine distance. It will surface semantically related securities even if none of the keywords from the original query appear in their descriptions. This capability transforms the security master from a reactive database into a proactive research engine, enabling discovery of non-obvious relationships and risks that traditional search methods would miss.

A Thoughtful Disclaimer: Not a Silver Bullet

Earning trust in a technical architecture requires acknowledging its boundaries. While Dragonfly is a powerful engine for a modern security master, it is not a magical replacement for every component in your data stack. A few critical considerations remain:

Persistence & Durability: As an in-memory store, a robust snapshot and backup strategy is essential. While Dragonfly offers configurable point-in-time snapshots, integrating this with your disaster recovery and audit protocols is a key operational task.
The Authority of Record: In large financial institutions, the ultimate “golden source” of truth will often remain a fully ACID-compliant, auditable RDBMS or a dedicated security master platform. Dragonfly is best positioned as the high-performance “operational master.” It is loaded with relatively static reference data (like issuer details, sector, and terms) from the system of record and synchronized with real-time data (like live prices and yields) from market feeds.
Complex Data Relationships: For data with deep, multi-way relationships that would typically be solved with SQL JOIN operations, it’s important to recognize that Dragonfly, as a key-value store with rich data types and search capabilities, does not natively support joins. While denormalizing data into nested JSON documents and managing relationships in application logic can address many use cases, scenarios involving highly complex analytical queries across entities are still better served by a dedicated relational or graph database.

This highlights that Dragonfly’s role is to excel as the high-throughput, performant data access layer, not to replace every database. It’s about using the right tool for the right job.

Final Thoughts: The Future of Financial Data at Speed

The premise is transformative: a security master should be a strategic enabler and a source of competitive advantage, not a legacy cost center or a bottleneck. By adopting a modern data store like Dragonfly, the key win is the powerful convergence of developer velocity and operational performance.

Developers work with the familiar, well-loved Redis API, dramatically reducing learning curves and migration risk. Yet, they unlock modern capabilities (rich data types, native JSON, secondary indexing, and vector search) that consolidate multiple data access patterns into one simple layer. Simultaneously, operations teams gain a system with predictable, multi-core performance at scale, reducing the operational complexity of managing clusters with small single-threaded instances or too many specialized databases.

The result is a unified, high-performance data fabric that can keep pace with the real-time demands of modern finance. It closes the infrastructure gap, turning the security master from a passive repository into an active, intelligent engine at the heart of your trading and risk systems.

Ready to architect your high-performance security master?

Explore Dragonfly on GitHub and our documentation to dive into the technical specifics.
Model your data using the schema and indexing principles outlined in this blog post (full code in our public examples repository) as a starting point.
Benchmark your critical queries against your current setup to quantify the performance lift.

The future of financial data runs at speed. It’s time to build for it.