Introducing Dragonfly Cloud! Learn More

Question: What is the difference between MongoDB horizontal scaling and sharding?

Answer

Horizontal scaling and sharding are often discussed together in the context of MongoDB, but they refer to related yet distinct concepts. Here's an explanation of both terms and how they integrate with each other.

Horizontal Scaling

Horizontal scaling, also known as scale-out, involves adding more machines to your pool of resources to manage increased load. It contrasts with vertical scaling (scale-up), where you increase the resources (CPU, RAM, storage) of a single machine. Horizontal scaling is crucial for distributed systems like MongoDB, especially when dealing with large volumes of data or high throughput requirements.

In MongoDB, horizontal scaling is achieved through sharding.

Sharding

Sharding is MongoDB's approach to horizontal scaling. It involves partitioning data across multiple servers (or shards) to spread out the load. Each shard contains a subset of the data, and MongoDB manages access to the data across shards transparently, allowing applications to interact with the database as if it were a single entity.

A sharded cluster in MongoDB consists of the following components:

  • Shard: Each shard holds a subset of the sharded data. A shard can be a single mongod instance or a replica set.
  • Mongos: The query router. Applications connect to the mongos, which routes queries to the appropriate shard(s).
  • Config server: Stores metadata and configuration settings for the cluster. Typically deployed as a replica set for redundancy and consistency.

Example of Setting Up a Simple Sharded Cluster:

  1. Start Config Servers:

    mongod --configsvr --replSet configReplSet --dbpath /data/configdb --port 27019
  2. Initialize the Replica Set for Config Servers:

    rs.initiate({_id: "configReplSet", configsvr: true, members: [{_id: 0, host: "<your_host>:27019"}]})
  3. Start Shard Servers (as Replica Sets for High Availability):

    mongod --shardsvr --replSet shard1ReplSet --dbpath /data/shard1db --port 27018
  4. Initialize the Replica Set for Each Shard Server:

    rs.initiate({_id: "shard1ReplSet", members: [{_id: 0, host: "<your_shard_host>:27018"}]})
  5. Start Mongos and Add Shards:

    mongos --configdb configReplSet/<your_config_host>:27019 --port 27017

    Then, connect to the mongos using the mongo shell and add the shard:

    sh.addShard("shard1ReplSet/<your_shard_host>:27018")

This example simplifies the setup process. In production, consider factors like security, network topology, and hardware specifications.

Conclusion

While horizontal scaling refers to the general concept of spreading load across multiple machines, sharding is the specific mechanism by which MongoDB achieves horizontal scaling. Implementing sharding effectively allows MongoDB to handle larger datasets and higher throughput operations by distributing data and workload across multiple servers.

Was this content helpful?

White Paper

Free System Design on AWS E-Book

Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.

Free System Design on AWS E-Book

Start building today 

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement.