TransWikia.com

Parallelization of two procedures inside Module

Mathematica Asked on August 10, 2021

I am new in parallel programming and I am trying to speed up an old code using parallelization when possible. So I am (re)writing a function that computes two lists with the same dimensions (called gf1 and gf2), using two different procedures which take more or less the same computational time, and that returns the sum of these lists gf1+gf2. Here’s a sketch of how my function works now: the evaluation takes 20s, but I feel like it can be reduced to a little more than 10s by parallelizing procedure 1 and procedure 2.

f[x_] := Module[
  {range = {-1., 1.}, n = 5000},
  
  Pause[10];(*procedure 1 to get gf1*)
  gf1 = RandomReal[range, n];
  
  Pause[10];(*procedure 2 to get gf2*)
  gf2 = RandomReal[range, n];
  
  Return[gf1 + gf2];
  ]

So I have two questions:

  • I have read interesting discussions about overhead issues in parallelization, so does it make sense to parallelize this code? (in the two procedures I basically call several customized functions and use several variables which I should communicate to the Kernels…)
  • If yes, how can I assign the evaluation of gf1 to one Kernel and the evaluation of gf2 to another?

Thanks in advance for any answer 🙂

One Answer

I have read interesting discussions about overhead issues in parallelization, so does it make sense to parallelize this code?

Yes, there is overhead, and judging how much that overhead is in a specific case is not always trivial.

I usually follow these two guidelines when deciding what to parallelize:

  • Parallelization makes sense over preferably large, but completely independent computational units. The classic example is when there is a set of items, each of which need to be processed independently with the same function. This is an excellent fit for ParallelMap or ParallelTable.

    If the computational units are not independent, and you see yourself looking to use SetSharedVariable / SetSharedFunction, then the problem is usually not a good fit for parallelizing in Mathematica.

  • Try to make the computational units as large as possible. If you are implementing a function f that is called by a function g, it is usually best to parallelize in g and not in f. If g will be mapped to a list of data items, do not parallelize g: instead, use ParallelMap[g, items].

    It is good to think about this before you start implementing your function. It is common that during the first, exploratory stage of a project, people call their function with one input at a time. There might be a high temptation to parallelize the function internally, but this may not be very efficient. If during a later stage of the project the function will be mapped to large lists of inputs, leave the parallelization for later.

    There are cases when this principle does not apply. Consider for example ParallelTable[f[k], {k, 10}] where f[i] takes $O(2^k)$ time to run. Then computing f[10] will take as much time as computing f[k] for all the k < 10, and you might end up with a single parallel kernel working alone for a long time, while the rest are idle. In such cases, it is often better to parallelize f internally if possible.

If yes, how can I assign the evaluation of gf1 to one Kernel and the evaluation of gf2 to another?

You can evaluate two different pieces of computation in parallel like this:

f[x_] := (Echo[x, f]; x)
g[x_] := (Echo[x, g]; x)
In[813]:= Parallelize[{f[1], g[2]}]

(kernel 4) >> f 1

(kernel 3) >> g 2

Out[813]= {1, 2}

If each of the two pieces of computation really take multiple seconds, as in your example, I think it is definitely worth doing this. However, if you are going to run your function for many inputs later, then don't bother parallelizing it internally on only 2 cores. It won't hurt though if you do it anyway. Parallel constructs can be nested safely. The inside ones will simply not run in parallel. There will be a warning message which you can permanently turn off with the not-so-obvious ParallelEvaluate[Off[General::subpar]]. Alternatively, you can use Quiet.

In[815]:= ParallelTable[Parallelize[{f[i], g[i]}], {i, 4}]

(kernel 4) Parallelize::subpar :  Parallel computations cannot be nested; proceeding with sequential evaluation.

(kernel 3) Parallelize::subpar :  Parallel computations cannot be nested; proceeding with sequential evaluation.

(kernel 2) Parallelize::subpar :  Parallel computations cannot be nested; proceeding with sequential evaluation.

(kernel 1) Parallelize::subpar :  Parallel computations cannot be nested; proceeding with sequential evaluation.

(kernel 4) >> f 1

(kernel 3) >> f 2

(kernel 2) >> f 3

(kernel 1) >> f 4

(kernel 4) >> g 1

(kernel 3) >> g 2

(kernel 2) >> g 3

(kernel 1) >> g 4

Out[815]= {{1, 1}, {2, 2}, {3, 3}, {4, 4}}

Update:

It appears that Module interacts badly with parallelization. Example:

Module[{z = 123},
 Parallelize[{f[z], g[z]}]
]

This will cause z to leak from Module and its value to be retained on both the main kernel and subkernels. You can see this by evaluating ParallelEvaluate[Names["Global`z*"]] and Names["Global`z*"]. Each evaluation of the Module creates a new "localized" z that will not get cleaned up.

You can use a workaround like this:

Module[{y = 123},
 With[{y0 = y}, Parallelize[{f[y0], g[y0]}]]
]

You can even use the following (it is what I normally do):

Module[{y = 123},
 With[{y = y}, Parallelize[{f[y], g[y]}]]
]

Correct answer by Szabolcs on August 10, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP