ShareChat Achieves Better Performance and 40% Cost Reduction with Dragonfly Cloud
How India’s largest social media company modernized its infrastructure, migrating 150+ services to Dragonfly Cloud for 40% lower costs and no noisy neighbor issues.
September 2, 2025

Background: Redis API Everywhere
At ShareChat, Redis APIs were embedded in almost everything: caches, recommendation delivery systems, deduplication queues, ranking pipelines, counters, you name it. This was a result of our business growing rapidly and the flexibility of the Redis API, which makes it a very good fit for many use cases. Some workloads were bursty, others sustained, but all of them were business-critical.
Because Redis is such a critical part of our infrastructure, we were constantly searching for a solution that could scale with our business by delivering low latency, stability, cost efficiency, and operational simplicity. Over the years, we used multiple deployment methods and technologies, including open-source and managed services. As usage continued to scale, we realized we needed:
- Strong cost efficiency at high scale.
- Isolation – No multi-tenant interference.
- Compatibility – A drop-in replacement with minimal code changes.
- Managed service – Our engineers should build features, not babysit clusters.
- Reliability – Stability under sustained load and sudden bursts.
- Scalability – Seamless scaling without rewriting application logic.
Why We Chose Dragonfly
I had been following Dragonfly for some time and was intrigued by its multi-threaded architecture. Unlike Redis, which was primarily single-threaded (both Redis and Valkey added multi-threading for I/O recently), Dragonfly uses all the CPUs on a machine. This more modern architecture promised much higher price-performance and resilience to spikes.
On paper, it looked like exactly what we needed: better performance, simpler scaling, and better cost efficiency. To verify this, we started small, deploying Dragonfly Community Edition for a few cache-heavy production workloads. The results were convincing: we used less hardware, P99 latency was lower for heavy workloads, migration was extremely straightforward, and the operational simplicity was refreshing. That experience gave us the confidence to consider a full-scale migration.
Setting the Ground Rules
We committed to an 8-week window to migrate over 100 Redis databases touching ~150 applications spread across 40–50 repositories.
To make this feasible, we established a few guiding principles:
- Zero downtime – Business impact was not an option.
- Drop-in compatibility – No application rewrites unless absolutely necessary.
- Parallel migration – Each application team owned its cutover. No centralized bottlenecks.
- Centralized migration support – A core infra team handled replication and monitoring as well as provided assistance.
- Live replication – Continuous data sync with verification before cutover. No big-bang switchovers.
The Migration Process
1. Inventory and Planning
We began by creating a complete inventory of every Redis Cluster:
- Dataset size, memory profile, persistence needs, eviction policies, Lua usage.
- Team ownership and repository references.
We classified workloads into two main types:
- Cache-only – No data migration required.
- Ephemeral/persistent – Data migration required.
This allowed us to plan cutovers workload by workload.
2. Data Replication
After evaluating several tools, we chose Redis Shake, as recommended by the Dragonfly team. Because our existing provider did not allow PSYNC
, we used a SCAN
-based migration, which is slower but highly reliable. It allowed continuous replication with no disruption to live traffic and gave application teams full control over the timing of their cutovers.
3. Monitoring and Visibility
Dragonfly Cloud comes with dashboards for each data store, which work well for a few clusters but not for hundreds. Using the Prometheus integration, we pulled metrics into our centralized monitoring stack. This gave our developers side-by-side visibility of old and new metrics during dual-running and made it easy to validate performance and correctness.
4. Distributed Execution
We scheduled migration windows for each team. Application teams handled their own service cutovers; my team provided support for replication and monitoring. Multi-tenant Redis Clusters were drained gradually, workload by workload. We prioritized the heaviest workloads first, saving smaller clusters for the final phase once the process was battle-tested.
Results
- 40% cost reduction – Lower infrastructure footprint and more efficient resource usage.
- No noisy neighbors – With Dragonfly we are using dedicated data stores, completely eliminating contention.
- Simplified operations – Spinning up, resizing, or replacing a cluster now takes minutes. No more patching, upgrades, or manual failovers.
- Better performance – Dragonfly’s multi-threaded architecture delivers lower latency and higher throughput.
- Zero downtime – The migration was completed in 8 weeks without a single user-facing incident.
Final Thoughts
Looking back, the biggest win wasn’t just cost—it was modernizing our infrastructure. We no longer worry about saving every byte that goes into memory or how our infrastructure will handle the next traffic peak. Our engineering teams are freed up to deliver new and better features to our users, and our users are getting a faster, more stable service.
Arya Ketan
Distinguished Engineer @ShareChat