Today I was working in a MongoDB 3.0 sharded cluster environment. There was a particular sharded collection that had 300-some-odd chunks evenly distributed within each shard (thanks to the balancer). These chunks happened to be empty, and in need of some pre-splitting for near-future use.
I ended up writing MongoDB shell scripts to handle the migration of all the chunks to the primary shard, and to merge all of the chunks to a single chunk. The scripts adhere to the following:
- Authenticate a clusterAdmin user against the admin database. (I actually used a “root” role user.)
- Read the config database for sharding topology and chunk distribution.
- Any “write”-like commands use sharding helpers where possible, and runCommand otherwise. No “write”-like commands use CRUD operations on the config database.
Don’t forget to stop the balancer before running these scripts, and then start the balancer when they’re done.
Step 1: Migrate Chunks to Primary Shard
It is impossible to merge chunks that are not on the same shard. First, it is necessary to migrate all the chunks, and for the sake of some standard, on the primary shard.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
var databaseName = 'database'; // your database name var collectionName = 'collection'; // your collection var adminUsername = 'username'; // user with clusterAdmin role var adminPassword = 'password'; // that user's password var namespace = databaseName+'.'+collectionName; var admindb = db.getSiblingDB('admin'); var configdb = db.getSiblingDB('config'); admindb.auth(adminUsername,adminPassword); if ( sh.getBalancerState() ) { print('balancer is enabled, turn it off'); quit(); } var foundDatabase = configdb.databases.findOne({_id:databaseName,partitioned:true}); if ( !foundDatabase || !foundDatabase.primary ) { print('no partitioned database found with name '+databaseName); quit(); } var primaryShard = foundDatabase.primary; print('primary shard is '+primaryShard); var eligibleChunks = configdb.chunks.find({ns:namespace,shard:{$ne:primaryShard}}); if ( eligibleChunks.count() === 0 ) { print('no eligible chunks were found, either none exist or they are all already on the primary shard.'); } var i = 0; eligibleChunks.forEach(function(chunk) { i++; print(i+' moving chunk id '+chunk._id.toString()); var result = admindb.runCommand({moveChunk:namespace,bounds:[chunk.min,chunk.max],to:primaryShard}); if ( !result || !result.ok ) { print(i+' error moving chunk with id '+chunk._id.toString()); print(JSON.stringify(result)); quit(); } print(i+' moved chunk successfully'); }); |
Run the script as follows:
1 |
mongo mongos.example.com/admin migratechunks.js |
Remember, authentication happens in the script, so no need to pass -u or -p credentials via the CLI. The script automatically finds the admin and config databases as well, so connecting to the admin database is not required.
Step 2: Merge All Chunks
Now that all the chunks are on the primary shard, it is time to merge them into a single chunk. Only contiguous chunks can be merged, and only two chunks at a time can be considered. If there are hundreds or thousands of these 64MB chunks, this can be tedious if done manually. The following MongoDB shell script finds a pair of contiguous chunks and merges them; it repeats this process until there are no more chunks left to merge.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
var databaseName = 'database'; // your database name var collectionName = 'collection'; // your collection var adminUsername = 'username'; // user with clusterAdmin role var adminPassword = 'password'; // that user's password var namespace = databaseName+'.'+collectionName; var admindb = db.getSiblingDB('admin'); var configdb = db.getSiblingDB('config'); admindb.auth(adminUsername,adminPassword); if ( sh.getBalancerState() ) { print('balancer is enabled, turn it off'); quit(); } var foundDatabase = configdb.databases.findOne({_id:databaseName,partitioned:true}); if ( !foundDatabase || !foundDatabase.primary ) { print('no partitioned database found with name '+databaseName); quit(); } var eligibleChunks = 0; var i = 0; while ( eligibleChunks = configdb.chunks.count({ns:namespace}) ) { i++; print(i+' eligible chunks = '+eligibleChunks); print(i+' finding subject chunk'); var subject = configdb.chunks.findOne({ns:namespace}); if ( !subject || !subject.max || !subject.min ) { print(i+' no valid subject chunk found'); print(subject); break; } print(i+' finding contiguous chunk'); var contiguous = configdb.chunks.findOne({ns:namespace,min:subject.max}); if ( !contiguous || !contiguous.max || !contiguous.min ) { print(i+' no valid contiguous chunk found'); print(contiguous); break; } print(i+' merging chunks'); var result = admindb.runCommand({mergeChunks:namespace,bounds:[ subject.min, contiguous.max ]}); if ( !result || !result.ok ) { print(i+' merge failure'); print(JSON.stringify(result)); break; } print(i+' merged '+JSON.stringify(subject.min)+', '+JSON.stringify(contiguous.max)); } |
Just like with the previous migrate script, run this one the same way:
1 |
mongo mongos.example.com/admin mergechunks.js |
Further Considerations
This exercise is also beneficial as prep work for:
- Unsharding a collection
- Unsharding a database
- Changing the shard key of a sharded collection
- Manually re/pre-splitting of shard chunks