Unix & Linux Asked by Ravnoor S Gill on January 6, 2021
I have been trying to parallelize the following script, specifically each of the three FOR loop instances, using GNU Parallel but haven’t been able to. The 4 commands contained within the FOR loop run in series, each loop taking around 10 minutes.
#!/bin/bash
kar='KAR5'
runList='run2 run3 run4'
mkdir normFunc
for run in $runList
do
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
rm -f *.mat
done
Why don't you just fork (aka. background) them?
foo () {
local run=$1
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
for run in $runList; do foo "$run" & done
In case that's not clear, the significant part is here:
for run in $runList; do foo "$run" & done
^
Causing the function to be executed in a forked shell in the background. That's parallel.
Correct answer by goldilocks on January 6, 2021
I really like the answer from @lev as it provides control over the maximum number of processes in a very simple manner. However as described in the manual, sem does not work with brackets.
for stuff in things
do
sem -j +0 "something;
with;
stuff"
done
sem --wait
Does the job.
-j +N Add N to the number of CPU cores. Run up to this many jobs in parallel. For compute intensive jobs -j +0 is useful as it will run number-of-cpu-cores jobs simultaneously.
-j -N Subtract N from the number of CPU cores. Run up to this many jobs in parallel. If the evaluated number is less than 1 then 1 will be used. See also --use-cpus-instead-of-cores.
Answered by moritzschaefer on January 6, 2021
In my case, I can't use semaphore (I'm in git-bash on Windows), so I came up with a generic way to split the task among N workers, before they begin.
It works well if the tasks take roughly the same amount of time. The disadvantage is that, if one of the workers takes a long time to do its part of the job, the others that already finished won't help.
# array of assets, assuming at least 1 item exists
listAssets=( {a..z} ) # example: a b c d .. z
# listAssets=( ~/"path with spaces/"*.txt ) # could be file paths
# replace with your task
task() { # $1 = idWorker, $2 = asset
echo "Worker $1: Asset '$2' START!"
# simulating a task that randomly takes 3-6 seconds
sleep $(( ($RANDOM % 4) + 3 ))
echo " Worker $1: Asset '$2' OK!"
}
nVirtualCores=$(nproc --all)
nWorkers=$(( $nVirtualCores * 1 )) # I want 1 process per core
worker() { # $1 = idWorker
echo "Worker $1 GO!"
idAsset=0
for asset in "${listAssets[@]}"; do
# split assets among workers (using modulo); each worker will go through
# the list and select the asset only if it belongs to that worker
(( idAsset % nWorkers == $1 )) && task $1 "$asset"
(( idAsset++ ))
done
echo " Worker $1 ALL DONE!"
}
for (( idWorker=0; idWorker<nWorkers; idWorker++ )); do
# start workers in parallel, use 1 process for each
worker $idWorker &
done
wait # until all workers are done
Answered by geekley on January 6, 2021
Just a vanilla bash script - no external libs/apps needed.
#!/bin/bash
N=4
for i in {a..z}; do
(
# .. do your stuff here
echo "starting task $i.."
sleep $(( (RANDOM % 3) + 1))
) &
# allow to execute up to $N jobs in parallel
if [[ $(jobs -r -p | wc -l) -ge $N ]]; then
# now there are $N jobs already running, so wait here for any job
# to be finished so there is a place to start next one.
wait -n
fi
done
# no more jobs to be started but wait for pending jobs
# (all need to be finished)
wait
echo "all done"
#!/bin/bash
N=4
find ./my_pictures/ -name "*.jpg" | (
while read filepath; do
jpegoptim "${filepath}" &
if [[ $(jobs -r -p | wc -l) -ge $N ]]; then wait -n; fi
done;
wait
)
Answered by Tomasz Hławiczka on January 6, 2021
I had trouble with @PSkocik
's solution. My system does not have GNU Parallel available as a package and sem
threw an exception when I built and ran it manually. I then tried the FIFO semaphore example as well which also threw some other errors regarding communication.
@eyeApps
suggested xargs but I didn't know how to make it work with my complex use case (examples would be welcome).
Here is my solution for parallel jobs which process up to N
jobs at a time as configured by _jobs_set_max_parallel
:
_lib_jobs.sh:
function _jobs_get_count_e {
jobs -r | wc -l | tr -d " "
}
function _jobs_set_max_parallel {
g_jobs_max_jobs=$1
}
function _jobs_get_max_parallel_e {
[[ $g_jobs_max_jobs ]] && {
echo $g_jobs_max_jobs
echo 0
}
echo 1
}
function _jobs_is_parallel_available_r() {
(( $(_jobs_get_count_e) < $g_jobs_max_jobs )) &&
return 0
return 1
}
function _jobs_wait_parallel() {
# Sleep between available jobs
while true; do
_jobs_is_parallel_available_r &&
break
sleep 0.1s
done
}
function _jobs_wait() {
wait
}
Example usage:
#!/bin/bash
source "_lib_jobs.sh"
_jobs_set_max_parallel 3
# Run 10 jobs in parallel with varying amounts of work
for a in {1..10}; do
_jobs_wait_parallel
# Sleep between 1-2 seconds to simulate busy work
sleep_delay=$(echo "scale=1; $(shuf -i 10-20 -n 1)/10" | bc -l)
( ### ASYNC
echo $a
sleep ${sleep_delay}s
) &
done
# Visualize jobs
while true; do
n_jobs=$(_jobs_get_count_e)
[[ $n_jobs = 0 ]] &&
break
sleep 0.1s
done
Answered by Zhro on January 6, 2021
One really easy way that I often use:
cat "args" | xargs -P $NUM_PARALLEL command
This will run the command, passing in each line of the "args" file, in parallel, running at most $NUM_PARALLEL at the same time.
You can also look into the -I option for xargs, if you need to substitute the input arguments in different places.
Answered by eyeApps LLC on January 6, 2021
task(){
sleep 0.5; echo "$1";
}
for thing in a b c d e f g; do
task "$thing"
done
for thing in a b c d e f g; do
task "$thing" &
done
N=4
(
for thing in a b c d e f g; do
((i=i%N)); ((i++==0)) && wait
task "$thing" &
done
)
It's also possible to use FIFOs as semaphores and use them to ensure that new processes are spawned as soon as possible and that no more than N processes runs at the same time. But it requires more code.
# initialize a semaphore with a given number of tokens
open_sem(){
mkfifo pipe-$$
exec 3<>pipe-$$
rm pipe-$$
local i=$1
for((;i>0;i--)); do
printf %s 000 >&3
done
}
# run the given command asynchronously and pop/push tokens
run_with_lock(){
local x
# this read waits until there is something to read
read -u 3 -n 3 x && ((0==x)) || exit $x
(
( "$@"; )
# push the return code of the command to the semaphore
printf '%.3d' $? >&3
)&
}
N=4
open_sem $N
for thing in {a..g}; do
run_with_lock task $thing
done
We use file descriptor 3 as a semaphore by pushing (=printf
) and poping (=read
) tokens ('000'
). By pushing the return code of the executed tasks, we can abort if something went wrong.
Answered by PSkocik on January 6, 2021
for stuff in things
do
sem -j+0 "something;
with;
stuff"
done
sem --wait
This will use semaphores, parallelizing as many iterations as the number of available cores (-j +0 means you will parallelize N+0 jobs, where N is the number of available cores).
sem --wait tells to wait until all the iterations in the for loop have terminated execution before executing the successive lines of code.
Note: you will need "parallel" from the GNU parallel project (sudo apt-get install parallel).
Answered by lev on January 6, 2021
It seems the fsl jobs are depending on eachother, so the 4 jobs cannot be run in parallel. The runs, however, can be run in parallel.
Make a bash function running a single run and run that function in parallel:
#!/bin/bash
myfunc() {
run=$1
kar='KAR5'
mkdir normFunc
fsl5.0-flirt -in $kar"deformed.nii.gz" -ref normtemp.nii.gz -omat $run".norm1.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-flirt -in $run".poststats.nii.gz" -ref $kar"deformed.nii.gz" -omat $run".norm2.mat" -bins 256 -cost corratio -searchrx -90 90 -searchry -90 90 -searchrz -90 90 -dof 12
fsl5.0-convert_xfm -concat $run".norm1.mat" -omat $run".norm.mat" $run".norm2.mat"
fsl5.0-flirt -in $run".poststats.nii.gz" -ref normtemp.nii.gz -out $PWD/normFunc/$run".norm.nii.gz" -applyxfm -init $run".norm.mat" -interp trilinear
}
export -f myfunc
parallel myfunc ::: run2 run3 run4
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1 and spend an hour walking through the tutorial http://www.gnu.org/software/parallel/parallel_tutorial.html Your command line will love you for it.
Answered by Ole Tange on January 6, 2021
for stuff in things
do
( something
with
stuff ) &
done
wait # for all the something with stuff
Whether it actually works depends on your commands; I'm not familiar with them. The rm *.mat
looks a bit prone to conflicts if it runs in parallel...
Answered by frostschutz on January 6, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP