Akuity Boosts Argo CD Performance, Cuts Cost with Dragonfly

Argo CD has seen rapid adoption in the last few years and has become one of the most common tools used by DevOps teams running Kubernetes.

Akuity is the company founded by the original creators of Argo CD, and the Akuity Platform provides a proprietary, re-architected Argo CD that is 100X more scalable than open-source Argo CD. The platform runs Argo CD at a significant scale across customer environments.

Operating GitOps control planes at the enterprise level exposes architectural trade-offs that are easy to overlook early on but become increasingly visible as scale, availability requirements, and operational complexity increase.

Redis has been part of Argo CD’s architecture for a long time, and for good reasons. Redis is familiar, performant enough, and easy to integrate.

However, as Akuity scaled its managed Argo CD platform and supported increasingly large and complex workloads, Redis began to introduce operational and cost challenges. This post explores why Redis became a bottleneck in large-scale Argo CD deployments, the specific issues Akuity observed, and why Dragonfly emerged as a better fit.

The Problem with Redis in Argo CD

The Role of Redis in Argo CD

Before diving into the problems, it’s important to clarify Redis’s role in Argo CD.

Redis is not a critical dependency for Argo CD but more like a component of convenience. Argo CD uses Redis as an in-memory cache, not as a source of truth. If Redis goes down, Argo CD will still reconcile applications, sync resources, and keep clusters in the desired state. However, Redis provides quality-of-life improvements, such as caching manifests from Git to speed up refreshes and syncs, as well as storing application tree metadata used by the UI.

So if Redis goes down, Argo CD keeps working, but the UI becomes less informative for a while, refreshes get slower, and the system has to rebuild the cache from scratch.

Where Redis Started to Hurt

In a typical highly available (HA) Argo CD setup, Redis alone accounts for a surprisingly large chunk of the deployment:

As you can see, Redis alone has 6 pods and 12 containers, which is ~43% of pods and 60% of containers. For something that’s not even critical to the functioning of Argo CD, that’s a lot of infrastructure. Operationally, Redis also brought more pain points:

Replication & Failover: In a typical Redis HA setup with a single primary and multiple replicas, replication lag and best-effort failover can cause issues during network partitions. For Argo CD, this can lead to inconsistent or missing cache data, causing UI inconsistencies and slower refreshes or syncs as components are forced to rebuild cache state.
Scaling Constraints: The Redis HA setup within Argo CD relies on quorum-based leader election, which is why they require odd numbers of instances. This means that for HA, you’d need at least 3 instances. On top of that, changing the replica count isn’t as simple as just editing a number in the manifest. You also have to make configuration changes to Redis itself. This forced us to either overprovision Redis up front or risk performance cliffs during spikes like mass syncs, refreshes, or large application updates.
Startup & Recovery Issues: We frequently ran into cases where Redis pods would get stuck during initialization after being evicted or restarted for any reason. While Argo CD itself would continue functioning, these stuck pods delayed cache recovery and caused temporary gaps in UI data and slower refreshes until Redis fully came back up.

Because of all these issues, we decided to look into an alternative to Redis for use with Argo CD, which is when we found Dragonfly and decided to give it a shot.

Why Akuity Chose Dragonfly

Because Redis is not fundamental to Argo CD’s correctness, Akuity did not want to re-architect Argo CD to accommodate a new caching layer. The goal was a drop-in replacement that preserved Redis semantics while eliminating its operational downsides. Dragonfly fit that really well.

From Argo CD’s perspective, Dragonfly is API-compatible with Redis, requiring no application code changes. This made it possible to substitute Dragonfly and observe real-world behavior quickly.

Beyond compatibility, several other advantages stood out:

Much higher throughput and better memory efficiency under load.
A simpler high-availability model, managed by the Dragonfly Kubernetes operator instead of Redis Sentinel and/or HAProxy.
Easier scaling, without odd instance counts or cumbersome reconfiguration steps.

Fundamentally, these performance and operational gains are possible because Dragonfly is built on a more modern, thread-per-core architecture that unlocks superior performance. Its cloud-native design is completed by a fully featured, open-source Kubernetes operator, which not only provides a seamless deployment experience in Kubernetes environments but also streamlines other aspects such as version upgrades, configurable rollout strategy, and high availability out-of-the-box.

Results Observed in Production

The impact was visible almost immediately. With Dragonfly, Akuity reduced the total number of pods needed for the cache from 6 to just 2, and the number of containers from 12 to 2, without compromising on high availability! It is important to clarify that, like Redis, Dragonfly’s HA model also relies on replication between nodes for failover. However, because Dragonfly operates and replicates data at a significantly higher throughput, the replication lags are greatly minimized.

When Akuity initially rolled Dragonfly out to a small set of customer Argo CD instances, they saw roughly a 20% reduction in the total number of pods and about a 33% reduction in containers across those deployments. This wasn’t the result of tuning or downsizing Argo CD itself, but simply removing Redis HA and replacing it with a much lighter Dragonfly setup. This translated into better cluster bin packing, increased pod capacity on nodes, and fewer metrics and logs to collect and store.

ArgoCD Pod and Container Savings with Dragonfly

Resource usage dropped as well. Across those same instances, CPU requests decreased by about 14% and memory requests by roughly 20%.

At Akuity’s scale, these reductions compound quickly. When running large numbers of Argo CD instances, saving even a small amount of CPU or memory per instance adds up to meaningful cost and capacity gains, like in Akuity’s case, hundreds of gigabytes of memory.

ArgoCD CPU and Memory Savings with Dragonfly

Unexpected Benefits

One difference Akuity noticed was in replication-related memory savings. To keep Redis HA reliable, Akuity had configured a 64 MB replication backlog per instance, which meant roughly 192 MB of memory reserved in a standard three-pod HA setup solely to handle replication edge cases. Dragonfly handles replication differently and more efficiently, without requiring that fixed memory reservation. After migrating, Akuity saw steady-state per-instance memory usage drop from ~272 MiB down to ~64 MiB in some deployments, which leads to a clear reduction in overall memory consumption.

ArgoCD Replication Memory Savings with Dragonfly

The most surprising results, though, came from customers pushing Argo CD to its limits. Argo CD supports multiple sources per application, and Akuity had a customer using 47 sources in a single app!

Any change to one of those sources resulted in massive cache writes, with some keys being on the order of 20 MB, which then had to be replicated across Redis replicas. In HA setups, that replication traffic quickly drove up network usage and cost. After migrating those instances to Dragonfly, the effect was immediate: the reduced replication traffic dropped the total network bandwidth from 24 MiB/s to 8 MiB/s, and the cost savings showed up right away.

ArgoCD Replication Network Savings with Dragonfly

Final Thoughts

Redis has served Argo CD well for a long time, but as we operated it at larger and larger scales, its limitations as a highly available cache became harder to ignore. What stood out to us about Dragonfly was better performance, lower CPU and memory requirements, simpler scaling, and the need for far less replication-related infrastructure. Together, these operational and cost benefits demonstrated that Dragonfly addressed quite a few core operational pain points we experienced with Redis in our environment at Akuity.

If you’re running Argo CD at scale and finding that Redis is consuming more resources and attention than it should for a cache, Dragonfly is well worth a look. You can check out this repo, where Akuity configured Argo CD to use Dragonfly instead of Redis.

Running Argo CD at Scale

Operating Argo CD across many clusters and teams introduces challenges around performance, availability, and operational overhead. Akuity’s managed Argo CD platform is designed to address these challenges with a hardened control plane, scalable architecture, and production-proven operational patterns.

Explore how Akuity runs Argo CD at scale - akuity.io

Akuity Improves Argo CD Performance and Cuts Infrastructure Overhead by Replacing Redis with Dragonfly

The Problem with Redis in Argo CD

The Role of Redis in Argo CD

Where Redis Started to Hurt

Why Akuity Chose Dragonfly

Results Observed in Production

Unexpected Benefits

Final Thoughts

Running Argo CD at Scale

Switch & save up to 80%

Akuity Improves Argo CD Performance and Cuts Infrastructure Overhead by Replacing Redis with Dragonfly

The Problem with Redis in Argo CD

The Role of Redis in Argo CD

Where Redis Started to Hurt

Why Akuity Chose Dragonfly

Results Observed in Production

Unexpected Benefits

Final Thoughts

Running Argo CD at Scale

Stay up to date on all things Dragonfly

Switch & save up to 80%