MongoDB: Migrate and Merge All Chunks in Shard

Today I was working in a MongoDB 3.0 sharded cluster environment. There was a particular sharded collection that had 300-some-odd chunks evenly distributed within each shard (thanks to the balancer). These chunks happened to be empty, and in need of some pre-splitting for near-future use.

I ended up writing MongoDB shell scripts to handle the migration of all the chunks to the primary shard, and to merge all of the chunks to a single chunk. The scripts adhere to the following:

  • Authenticate a clusterAdmin user against the admin database. (I actually used a “root” role user.)
  • Read the config database for sharding topology and chunk distribution.
  • Any “write”-like commands use sharding helpers where possible, and runCommand otherwise. No “write”-like commands use CRUD operations on the config database.

Don’t forget to stop the balancer before running these scripts, and then start the balancer when they’re done.

Step 1: Migrate Chunks to Primary Shard

It is impossible to merge chunks that are not on the same shard. First, it is necessary to migrate all the chunks, and for the sake of some standard, on the primary shard.

Run the script as follows:

Remember, authentication happens in the script, so no need to pass -u or -p credentials via the CLI. The script automatically finds the admin and config databases as well, so connecting to the admin database is not required.

Step 2: Merge All Chunks

Now that all the chunks are on the primary shard, it is time to merge them into a single chunk. Only contiguous chunks can be merged, and only two chunks at a time can be considered. If there are hundreds or thousands of these 64MB chunks, this can be tedious if done manually. The following MongoDB shell script finds a pair of contiguous chunks and merges them; it repeats this process until there are no more chunks left to merge.

Just like with the previous migrate script, run this one the same way:

Further Considerations

This exercise is also beneficial as prep work for:

  • Unsharding a collection
  • Unsharding a database
  • Changing the shard key of a sharded collection
  • Manually re/pre-splitting of shard chunks