Introducing Dragonfly Cloud! Learn More

Question: Does MongoDB support sharding?

Answer

Yes, MongoDB supports sharding. Sharding is a method used to distribute data across multiple servers or clusters to manage large datasets and high throughput operations efficiently. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.

Here are the key components involved in MongoDB sharding:

  • Shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.
  • Mongos: The query router that processes queries from the application, directing them to the appropriate shard(s).
  • Config Server: MongoDB uses config servers to store metadata about the cluster's configuration. This metadata helps the mongos route queries and commands to the appropriate shards.

How to Set Up Sharding in MongoDB

  1. Start Config Servers: Start multiple config servers (usually 3) to store cluster metadata.

    mongod --configsvr --dbpath /data/configdb --port 27019
  2. Start Shard Servers (mongod instances): Start mongod instances that will act as shards.

    mongod --shardsvr --dbpath /data/shard001 --port 27018 mongod --shardsvr --dbpath /data/shard002 --port 27018
  3. Start Mongos Process: The mongos process acts as a query router.

    mongos --configdb configReplSet/localhost:27019 --port 27017
  4. Add Shards to the Cluster: Connect to mongos and use the sh.addShard command to add your shard to the cluster.

    sh.addShard("shard001/localhost:27018")
    sh.addShard("shard002/localhost:27018")
    
  5. Enable Sharding for a Database: Use the sh.enableSharding(databaseName) command to enable sharding on a specific database.

  6. Shard a Collection: Finally, you need to decide on a shard key and shard a collection using the sh.shardCollection(fullCollectionName, shardKey) command.

Sharding allows MongoDB to scale horizontally, which means that as your dataset grows, you can add more servers to linearly scale the read and write throughput of your database.

It's essential to choose an appropriate shard key because it determines how data is distributed across the shards. A poorly chosen shard key might result in uneven distribution of data, which can lead to certain shards becoming hotspots.

Was this content helpful?

White Paper

Free System Design on AWS E-Book

Download this early release of O'Reilly's latest cloud infrastructure e-book: System Design on AWS.

Free System Design on AWS E-Book

Start building today 

Dragonfly is fully compatible with the Redis ecosystem and requires no code changes to implement.