MongoDB: Migrate and Merge All Chunks in Shard

Today I was working in a MongoDB 3.0 sharded cluster environment. There was a particular sharded collection that had 300-some-odd chunks evenly distributed within each shard (thanks to the balancer). These chunks happened to be empty, and in need of some pre-splitting for near-future use.

I ended up writing MongoDB shell scripts to handle the migration of all the chunks to the primary shard, and to merge all of the chunks to a single chunk. The scripts adhere to the following:

  • Authenticate a clusterAdmin user against the admin database. (I actually used a “root” role user.)
  • Read the config database for sharding topology and chunk distribution.
  • Any “write”-like commands use sharding helpers where possible, and runCommand otherwise. No “write”-like commands use CRUD operations on the config database.

Don’t forget to stop the balancer before running these scripts, and then start the balancer when they’re done.

Step 1: Migrate Chunks to Primary Shard

It is impossible to merge chunks that are not on the same shard. First, it is necessary to migrate all the chunks, and for the sake of some standard, on the primary shard.

Run the script as follows:

Remember, authentication happens in the script, so no need to pass -u or -p credentials via the CLI. The script automatically finds the admin and config databases as well, so connecting to the admin database is not required.

Step 2: Merge All Chunks

Now that all the chunks are on the primary shard, it is time to merge them into a single chunk. Only contiguous chunks can be merged, and only two chunks at a time can be considered. If there are hundreds or thousands of these 64MB chunks, this can be tedious if done manually. The following MongoDB shell script finds a pair of contiguous chunks and merges them; it repeats this process until there are no more chunks left to merge.

Just like with the previous migrate script, run this one the same way:

Further Considerations

This exercise is also beneficial as prep work for:

  • Unsharding a collection
  • Unsharding a database
  • Changing the shard key of a sharded collection
  • Manually re/pre-splitting of shard chunks

MongoDB 3.0: mongos CentOS 6 service scripts

Premise

When installing mongos (not mongod) on CentOS 6 using the RPMs provided in MongoDB’s repository, no startup/service/init.d scripts are created, or really anything for that matter, to aid in managing mongos as a service. All the RPM provides is the binary.

If you want to treat mongos as a service, there are quite a few steps to follow beyond just setting up a script in init.d:

  • mongos YAML configuration file
  • mongos sysconfig service overrides
  • SELinux port definition
  • a user to run mongos

Most of the time there are many mongos processes distributed within a MongoDB cluster. Configuring each of these can be a hassle, let alone installing and managing them, unless you use an automation tool.

Be sure to read “Install MongoDB on Red Hat Enterprise or CentOS Linux” first.

mongos Service Scripts

Below are the 3 files I use when preparing a mongos cluster:

  1. The init.d service script for managing the mongos process
  2. A base mongos configuration file in YAML
  3. An automation script to prepare the CentOS 6 environment for mongos

/etc/init.d/mongos

The following init.d script has been retooled from the stock MongoDB 3.0 mongod for mongos. Watch out for the subtle differences.

/etc/mongos.conf Base YAML Configuration

Be sure to set sharding.configDB  appropriately.

Post-Install Environment Setup

Provided mongodb-org-mongos has already been installed, the following script will do most of the legwork to get the CentOS 6 environment up and running. Note: The script is pulling the configuration file and service file dependencies from a web server using wget; this should be tailored. It performs the following:

  • Add a mongod user (yes, with a d) with the same settings that would be supplied from mongodb-org-server.
  • Add a placeholder file for sysconfig.
  • Add the mongos.conf file with the base configuration noted in previous (wget).
  • Add the init.d service script (wget).
  • If SELinux is enabled, set the contexts for the files created in previous, and add the port definition via semanage. Note that it will automatically install the required tools to perform this task if required.
  • Add and enable the new mongos service

SELinux and MongoDB

The automation script above does not add all of the required SELinux policies required for mongos.  Install the mongodb-org package to initialize all of the policies; it can be removed later if need be while keeping these policies intact. Finding the script used by MongoDB (or making one up) is a task for another day. A policy dump from semanage is as follows: