Sometimes commands take a long time to process (in my case importing huge SQL dumps into a series of sandboxed MySQL databases), in which case it may be favorable to take advantage of multiple CPUs/cores. This can be handled in shell/bash with the background control operator & in combination with wait. Got a little insight from this StackOverflow answer.
The Technique
The break-down is commented inline.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
#!/bin/bash processes=4; # max number of parallel processes #list of stuff read -r -d '' theList <<EOT big long list of args or commands to work with EOT count=0; pidList=''; for item in $theList; do count=$((count + 1)); longOperationWith "$item"& lastPid=${!}; # get the PID of `longOperationWith "$item"` pidList=`echo "$pidList $lastPid" | sed 's/^ *//g'`; # concat and strip that initial beginning space if [ $count -ge $processes ]; then # when we've reached the max number of processes echo "Waiting for $pidList to finish..."; wait $pidList; # wait for all PIDs in pidList to finish (don't quote) count=0; # then reset counter pidList=''; # and reset the list of PIDs fi; done; |
Notes
- This does not function like a pool (task queue) where once a “thread” is freed up it will be immediately eligible for the next task, although I don’t see why something like this could be implemented with a little work.
- wait will pause script execution for all PID parameters it is provided to complete before moving on.
- Once the if...fi control block is entered wait will cause the for...in loop to suspend.
- $! (or ${!}) contains the PID of the most previously executed command; make sure it comes directly after the operation used with &. Throw it into a variable (like lastPid) for future use.
- This is not multi-threading, although this simplified concept is similar. Each command spawns a separate process.
- read is just creating a multiline list; in my specific case this was favorable over a bash array, but either will work.
