TransWikia.com

Why Does Parallelization Not Speed Up these Seemingly Well-Parallelizable, Simple Functions?

Mathematica Asked on August 22, 2021


Original Example

Consider function f, a parallelized version fPar, and a coarsest-grained parallelized version fParCG below.

f[l_] := Map[Function[x, x[[#]] & /@ ConstantArray[Range[l], l]],
  Permutations[Range[l]]]

fPar[l_] := ParallelMap[Function[x, x[[#]] & /@ ConstantArray[Range[l], l]],
  Permutations[Range[l]]]

fParCG[l_] := ParallelMap[Function[x, x[[#]] & /@ ConstantArray[Range[l], l]],
  Permutations[Range[l]], Method -> "CoarsestGrained"]

The functions have the same output, which is just a list containing l copies of every permutation on Range[l].

f[3] // Column

(*
{{1,2,3},{1,2,3},{1,2,3}}
{{1,3,2},{1,3,2},{1,3,2}}
{{2,1,3},{2,1,3},{2,1,3}}
{{2,3,1},{2,3,1},{2,3,1}}
{{3,1,2},{3,1,2},{3,1,2}}
{{3,2,1},{3,2,1},{3,2,1}}
*)

I was surprised to see the parallelized versions are both slower.

f[9] // MaxMemoryUsed // AbsoluteTiming
(* {1.38304, 496422488} *)

fPar[9] // MaxMemoryUsed // AbsoluteTiming
(* {2.81347, 504604072} *)

fParCG[9] // MaxMemoryUsed // AbsoluteTiming
(* {2.46533, 561971768} *)

What in particular makes f not well-parallelizable?

There seems to be little overhead and the computations are independent. Function f is of the form Map[A,B] where each application of A to an element of B takes the same amount of time and the computations can be split equally, easily, and independently into different kernels. This is why I was expecting at least the coarsest grained version to perform better.


Notes

  • Yes, I have read Why won’t Parallelize speed up my code?. I am wondering what principle from the answer to that question my function f violates such that it is not apt for parallelization.
  • Secondly, I am not looking for a more efficient form of f. Function f is an inane way of generating its output. I am wondering what makes f, as it is, not well-parallelizable.

Another Example

Courtesy of Michael E2 in the comments…

Table[p, {p, Permutations[Range[9]]}]; // AbsoluteTiming
(*{0.056542, Null}*)

ParallelTable[p, {p, Permutations[Range[9]]}]; // AbsoluteTiming
(*{4.74558, Null}*)

This disparity in speed is troubling to me. (As noted in the accepted answer, ParallelTable[] unpacks here, whereas Table[] does not. This still troubles me.)

One Answer

As I noted in a comment, it seems that ParallelMap unpacks packed arrays when sending the data to the slave kernels:

data = Permutations[Range[9]];
Developer`PackedArrayQ[data]

True

This simple Map will not generate any messages about packing:

On["Packing"];
Map[Total, data];

(* No messages *)

ParallelMap[Total, data]

(* generates Developer`FromPackedArray::unpack message *)

Unpacking of arrays is most likely a significant source of slowdown in parallel maps like this one since sending unpacked data is much slower according to this answer

Edit

Actually, item 3.4 in this answer does mention this problem to some degree and also links to a solution for the reverse problem when the values returned by parallel operations are packed arrays. At any rate, it's good advice to track the packing behavior of your computation when using parallel operations.

Correct answer by Sjoerd Smit on August 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP