Introducing Dragonfly Cloud! Learn More

Announcing Dragonfly Kubernetes Operator General Availability

We are thrilled to announce the general availability of the Dragonfly Kubernetes Operator.

November 6, 2023

Announcing Dragonfly Kubernetes Operator General Availability


We are excited to announce that the Kubernetes Operator for Dragonfly is now generally available, making it simple and easy to run and manage Dragonfly on Kubernetes. Dragonfly is a data store built for modern cloud workloads, and Kubernetes is the leading orchestration engine for modern cloud workloads, making this a perfect fit for those looking to architect resilient, reliable, and performant applications.

Along with general availability, we are also excited to announce new capabilities such as advanced snapshotting, enterprise-grade security, and performance and reliability enhancements.

To get started immediately, visit our newly updated Dragonfly Operator documentation.

Advanced Snapshotting

Snapshotting has always been a reliable data backup mechanism for Dragonfly. In this latest release, we are taking it to the next level, ensuring that snapshotting is more seamlessly integrated with Kubernetes and cloud storage solutions. With the introduction of the high-level snapshot field in the Dragonfly Custom Resource Definition (CRD), configuring and utilizing snapshotting has never been easier.

By setting up this configuration, you empower Dragonfly to automatically handle data backups during pod terminations as well as restores when a pod comes back up again, minimizing downtime and maintaining the integrity of your operations.

Dragonfly Kubernetes Operator supports snapshotting in two ways: Persistent Volume Claims (PVC) and Cloud Storage. Each option offers unique advantages, catering to different use cases and preferences.

1. Persistent Volume Claims (PVC)

PersistentVolume (PV) is the method Kubernetes users employ to manage disk storage from underlying cloud or on-premise infrastructure. A PersistentVolumeClaim (PVC) is a request for storage by your applications. With the snapshot.persistentVolumeClaimSpec field, you can use the exact same Kubernetes PVC syntax to configure Dragonfly snapshotting storage.

kind: Dragonfly
  name: dragonfly-instance-snapshotting-to-pvc
  replicas: 1
    cron: '*/5 * * * *'
    persistentVolumeClaimSpec: # uses standard Kubernetes PVC API
        - ReadWriteOnce
          storage: 2Gi

More details can be found in the Dragonfly Operator snapshots to PVC documentation.

2. Cloud Storage

Dragonfly has recently added support for snapshotting to S3-compatible cloud storage. This allows for seamless writing and reading of snapshot files directly from an S3 bucket, facilitated by the --dr s3://<> server argument. To utilize this feature, the environment must be properly configured with the necessary credentials.

The same should work with a Dragonfly instance managed by the operator when the snapshot.dir field is set accordingly. Additionally, for those utilizing managed Kubernetes services such as Amazon EKS, there are tools available to attach an IAM role directly to a Kubernetes service account. This feature simplifies credential management, automating rotation based on the pod's lifecycle and eliminating the need to handle long-lived credentials.

kind: Dragonfly
  name: dragonfly-instance-snapshotting-to-s3
  replicas: 1
  serviceAccountName: dragonfly-s3-svc-acc # service account with S3 permissions
    dir: 's3://dragonfly-snapshots' # S3 bucket name

More details can be found in the Dragonfly Operator snapshots to S3 documentation.

Enterprise-Grade Security

1. Client Authentication

With the introduction of the authentication field in the Dragonfly Operator configuration, we have streamlined the process of authenticating clients connecting to your Dragonfly instance. Currently, the following two methods, passwordFromSecret and clientCaCertSecret, are supported.

passwordFromSecret utilizes Kubernetes Secrets to store and manage credentials. By specifying a secret in your configuration, Dragonfly will automatically retrieve and use the value associated with the key as the authentication password for clients.

kind: Dragonfly
  name: dragonfly-instance-with-password-auth
  replicas: 1
      name: dragonfly-auth
      key: password

clientCaCertSecret enhances security with TLS by having client certificates verified by Dragonfly. Setting this up requires a few more steps to follow. Detailed instructions for both methods above can be found in the Dragonfly Operator authentication documentation.

2. Server TLS

The Dragonfly Kubernetes Operator now supports the integration of TLS certificates. By specifying a Kubernetes Secret in your Dragonfly instance configuration, you can ensure that the certificates are propagated and configured. This results in encrypted communication between clients and the Dragonfly server, safeguarding network communications from man-in-the-middle attacks.

Using TLS with cert-manager is available in the Dragonfly Operator server TLS documentation.

Monitoring and Reliability

1. Monitoring with Prometheus & Grafana

Prometheus is the default way of monitoring and storing metrics in Kubernetes. We have new documentation on how to install the Prometheus Operator and use it to collect and store metrics.

Grafana can then be used to start visualizing these important metrics. We provide custom dashboards with important metrics that you can directly load and start monitoring your instances.

2. Reliability - Custom Rollout Strategy

Unlike the conventional way of relying on Kubernetes for stateful set upgrades, the Dragonfly Operator takes a proactive and controlled approach. When any modification is made to the Dragonfly Custom Resource, the Operator first initiates the upgrade process with the replicas. It upgrades each replica, pausing to confirm the readiness of at least one replica before proceeding. Following this validation, the master is then upgraded, with the Operator selecting one of the latest replicas to assume the master role. This whole rollout process is done automatically with no additional operational input.

3. Reliability - Using the REPLTAKEOVER Command for Upgrades

In previous iterations, upgrading Dragonfly presented certain challenges. Particularly, the abrupt transition from the old version of the master to a new one could result in potential data inconsistencies. Clients were not locked during this transition, meaning that writes to the old master might not have been fully propagated to the new master, leading to data loss.

The REPLTAKEOVER command addresses these challenges by locking the old master, ensuring that all ongoing operations are completed. Only once this steady state is achieved will the system proceed to migrate to the new master.

Finally, with the recent change to use Cluster IP Service instead of Headless Service, any failover updates are propagated to clients faster.


Last but not least, Dragonfly is known for being ultra-performant and extremely reliable. With all the abstraction layers in Kubernetes plus Dragonfly running in a containerized environment, we are able to achieve 1.3 million QPS with sub-millisecond P99.9 latency on an AWS c6gn.8xlarge instance. The load is generated with the memtier-benchmark tool. Detailed benchmarking steps and results can be found in the video below.


Kubernetes is designed for managing complex production workloads, and Dragonfly is designed to make it easy to scale those same workloads with unparalleled performance. We're excited to see what you build. If you would like a free trial of a fully managed Dragonfly Cloud account, please request one here.

Also, we will be hosting an online Technical Workshop about Dragonfly Operator on Nov 15, 2023. It's a great chance to connect and learn, and if you are interested, please register here.

Stay up to date on all things Dragonfly

Join the Dragonfly community to get access to exclusive content, events, and more!


Start building today 

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement.