Unix & Linux Asked by user233520 on December 11, 2020
I have several sub-directories within on high level directory. Each sub-directory has several files and a for loop shell script. The same for loop script is present in each sub-directory. I want to go into each sub-directory and run the for loop script in parallel in several terminals.
I tried this but it seems to do serially (one after another) but I want run all of them in parallel.
find dir_* -type f -execdir sh for_loop.sh {} ;
Assuming this does the right thing - only in serial:
find dir_* -type f -execdir sh for_loop.sh {} ;
Then you should be able to replace that with:
find dir_* -type f | parallel 'cd {//} && sh for_loop.sh {}'
To run it in multiple terminals GNU Parallel supports tmux
to run each command in its own tmux
pane:
find dir_* -type f | parallel --tmuxpane 'cd {//} && sh for_loop.sh {}'
It defaults to one job per CPU core. In your case you might want to run one more job than you have cores:
find dir_* -type f | parallel -j+1 --tmuxpane 'cd {//} && sh for_loop.sh {}'
GNU Parallel is a general parallelizer and makes is easy to run jobs in parallel on the same machine or on multiple machines you have ssh access to.
If you have 32 different jobs you want to run on 4 CPUs, a straight forward way to parallelize is to run 8 jobs on each CPU:
GNU Parallel instead spawns a new process when one finishes - keeping the CPUs active and thus saving time:
Installation
For security reasons you should install GNU Parallel with your package manager, but if GNU Parallel is not packaged for your distribution, you can do a personal installation, which does not require root access. It can be done in 10 seconds by doing this:
$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ ||
fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 67bd7bc7dc20aff99eb8f1266574dadb
12345678 67bd7bc7 dc20aff9 9eb8f126 6574dadb
$ md5sum install.sh | grep b7a15cdbb07fb6e11b0338577bc1780f
b7a15cdb b07fb6e1 1b033857 7bc1780f
$ sha512sum install.sh | grep 186000b62b66969d7506ca4f885e0c80e02a22444
6f25960b d4b90cf6 ba5b76de c1acdf39 f3d24249 72930394 a4164351 93a7668d
21ff9839 6f920be5 186000b6 2b66969d 7506ca4f 885e0c80 e02a2244 40e8a43f
$ bash install.sh
For other installation options see http://git.savannah.gnu.org/cgit/parallel.git/tree/README
Learn more
See more examples: http://www.gnu.org/software/parallel/man.html
Watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial: http://www.gnu.org/software/parallel/parallel_tutorial.html
Sign up for the email list to get support: https://lists.gnu.org/mailman/listinfo/parallel
Answered by Ole Tange on December 11, 2020
Probably the perfect tool for this is GNU Parallel:
parallel ::: dir_*/for_loop.sh
GNU Parallel not only runs each job in parallel, but also it demultiplexes their output so they won't interfere with each other.
From its man page:
GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input into blocks and pipe a block into each command in parallel.
If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.
GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.
Answered by dr_ on December 11, 2020
You should be passing on find
's output to xargs
, running in parallel mode:
find dir_*/ -type f -name for_loop.sh -print0 | xargs -0 -r -n 1 -P 3 -t sh
We are asking find
here to find all files with names of for_loop.sh recursively under the directories beginning with the names dir_ and pass them on to xargs, a file at a time, in parallel mode of running no more than 3 processes at any given time.
Use is made of the null delimiter