Mathematica Asked on October 22, 2021
Consider function f
, a parallelized version fPar
, and a coarsest-grained parallelized version fParCG
below.
f[l_] := Map[Function[x, x[[#]] & /@ ConstantArray[Range[l], l]],
Permutations[Range[l]]]
fPar[l_] := ParallelMap[Function[x, x[[#]] & /@ ConstantArray[Range[l], l]],
Permutations[Range[l]]]
fParCG[l_] := ParallelMap[Function[x, x[[#]] & /@ ConstantArray[Range[l], l]],
Permutations[Range[l]], Method -> "CoarsestGrained"]
The functions have the same output, which is just a list containing l
copies of every permutation on Range[l]
.
f[3] // Column
(*
{{1,2,3},{1,2,3},{1,2,3}}
{{1,3,2},{1,3,2},{1,3,2}}
{{2,1,3},{2,1,3},{2,1,3}}
{{2,3,1},{2,3,1},{2,3,1}}
{{3,1,2},{3,1,2},{3,1,2}}
{{3,2,1},{3,2,1},{3,2,1}}
*)
I was surprised to see the parallelized versions are both slower.
f[9] // MaxMemoryUsed // AbsoluteTiming
(* {1.38304, 496422488} *)
fPar[9] // MaxMemoryUsed // AbsoluteTiming
(* {2.81347, 504604072} *)
fParCG[9] // MaxMemoryUsed // AbsoluteTiming
(* {2.46533, 561971768} *)
What in particular makes f
not well-parallelizable?
There seems to be little overhead and the computations are independent. Function f
is of the form Map[A,B]
where each application of A
to an element of B
takes the same amount of time and the computations can be split equally, easily, and independently into different kernels. This is why I was expecting at least the coarsest grained version to perform better.
f
violates such that it is not apt for parallelization.f
. Function f
is an inane way of generating its output. I am wondering what makes f
, as it is, not well-parallelizable.Courtesy of Michael E2 in the comments…
Table[p, {p, Permutations[Range[9]]}]; // AbsoluteTiming
(*{0.056542, Null}*)
ParallelTable[p, {p, Permutations[Range[9]]}]; // AbsoluteTiming
(*{4.74558, Null}*)
This disparity in speed is troubling to me. (As noted in the accepted answer, ParallelTable[]
unpacks here, whereas Table[]
does not. This still troubles me.)
As I noted in a comment, it seems that ParallelMap
unpacks packed arrays when sending the data to the slave kernels:
data = Permutations[Range[9]];
Developer`PackedArrayQ[data]
True
This simple Map
will not generate any messages about packing:
On["Packing"];
Map[Total, data];
(* No messages *)
ParallelMap[Total, data]
(* generates Developer`FromPackedArray::unpack message *)
Unpacking of arrays is most likely a significant source of slowdown in parallel maps like this one since sending unpacked data is much slower according to this answer
Actually, item 3.4 in this answer does mention this problem to some degree and also links to a solution for the reverse problem when the values returned by parallel operations are packed arrays. At any rate, it's good advice to track the packing behavior of your computation when using parallel operations.
Answered by Sjoerd Smit on October 22, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP