How to compute many sums or tables efficiently

Question

I have some questions about programming in Mathematica and I would really appreciate it if you could help me.

I wish to plot the following function against the variable X, but to do this, first I have to solve the problem of how to perform 10 sums on different indexes efficiently, that is, in the shortest time possible. I have also tried to do with Tables or Matrices, but I have not been able to achieve a result, possibly I still need to learn more about how to apply them. Thank you very much.

FUN1[X_?NumericQ] := Parallelize[Sum[(-1)^(K2 - M1 - M2) I^(P1 + P2) 
   (2 P1 + 1) (2 P2 + 1)Sqrt[((2 L1 + 1) (2 L2 + 1))/((2 Λ1 +1) (2 Λ2 + 1))] 
   SphericalBesselJ[P1, -M1*140*X] SphericalBesselJ[P2, -M2*140*X] 
   SphericalHarmonicY[L1, -M1, Pi/2, 0] SphericalHarmonicY[L2, -M2, Pi/2, 0] 
   ClebschGordan[{70, 70}, {L1, 0}, {70, 70}] 
   ClebschGordan[{70, 70}, {L2, 0}, {70, 70}] 
   ClebschGordan[{P1, 0}, {L1, 0}, {Λ1, 0}] 
   ClebschGordan[{P2, 0}, {L2, 0}, {Λ2, 0}] 
   ClebschGordan[{P1, 0}, {L1, M1}, {Λ1, K1}] 
   ClebschGordan[{P2, 0}, {L2, M2}, {Λ2, K2}] 
   KroneckerDelta[Λ1, Λ2] KroneckerDelta[-K1, K2], 
 {L1, 0, 140}, {M1, -L1, L1}, {P1, 0, 280},
 {Λ1, Abs[P1 - L1], P1 + L1}, {K1, -Λ1, Λ1}, 
 {L2, 0,  140}, {M2, -L2, L2}, {P2, 0, 280},
 {Λ2, Abs[P2 - L2], P2 + L2}, {K2, -Λ2, Λ2}
 ]
]

Henrik Schumacher · Answer

Just to give you an idea what you try to do:

The number of summands in your sum is

m = Sum[(2 Λ1 + 1) (2 L1 + 1), {L1, 0, 140}, {P1, 0, 280}, {Λ1, Abs[P1 - L1], P1 + L1}];
numberOfSummands = N[m^2]

8.70977*10^22

Your summands contain special functions such as ClebschGordan, SphericalBesselJ, and SphericalHarmonicY, functions that need many floating point operations per evaluation. Let's be very conservative and say that each of these functions needs 10 floating point operations, which makes about 100 floating point operations per summand. So, it is a very, very conservative lower bound to say that we have to perform this many floating point operations in total

floatOpsNeeded = numberOfSummands 100

8.70977*10^24

The most powerful computer on earth is currently (at the time I write this) [Summit]https://de.wikipedia.org/wiki/Summit_(Supercomputer) with 122.3 PFLOPS; that's 1.223 10^17 floating point operations per second. Let's assume that you can perfectly parallelize your code on Summit so that you get really the full 122.3 PFLOPS. Then you would need this many seconds:

SummitPower = 1.223 10^17;
sec = floatOpsNeeded/SummitPower

7.12164*10^7

And in days, this is

sec/60/60/24

824.264

That's more than two years on the fastest computer on earth! So your question should rather not be about how to sum efficiently. It should be on how to simplify or to approximate the sum in a way that cuts down the number of floating point operations by several orders of magnitude.

A first step towards that would be to use the KroneckerDeltas to eliminate the summations over Λ2 and K2. Then at least Summit would be able to compute the sum in about 22 seconds:

m1 = Sum[(2 Λ1 + 1) (2 L1 + 1), {L1, 0, 140}, {P1, 0, 280}, {Λ1, Abs[P1 - L1], P1 + L1}];
m2 = Sum[(2 L1 + 1), {L1, 0, 140}, {P1, 0, 280}];
m1 m2 100/SummitPower/60

22.4683

But this is only the evaluation time of FUN1[X] for a single X and it is not easy to get computation time on such large clusters. So quite certainly, you have to simplify the summation even more.

Edit:

The summands appear to be symmetric in the index pairs {P1, P2}, {M1, M2}, and {L1, L2}. At least the summetry in {P1, P2} and {L1, L2} is easily exploited and would reduce the number of summands to compute by roughly a factor of 4 (by summing only P1<P2 and L1<L2 and multiplying by 4 and by addong the sum for P1 == P2 and L1 == L2). If the symmetry in {M1, M2} could also be exploited, this would be a nother factor of 2. And so on...

Edit 2:

The way physicists expand their summations is far from optimal from the point of view of numerics -- not only because of their extensive usage of Kronecker deltas. (This is meant non-judgemental: The theoretical work with nested sums is often simplified significantly this way.) Roughly, your sum has the form

$$sum_{i=1}^n sum_{j=1}^n sum_{k=1}^m a_{i,k} , b_{j,k}.$$

The brute-force summation would need $m , n^2$ multiply and add operations (modern machines use fused multiply-add operations when appropriately vectorizing the code but that is not essential in the following).

The resulting number will be equal to

$$sum_{k=1}^m left(sum_{i=1}^n a_{i,k}right) , left(sum_{i=1}^n b_{i,k} right).$$

Performing the summations in parentheses first requires $2 n$ summations. Afterwards, the outer sum requires $m$ multipilations and $m$ additions. Hence this approach would cut the effort from $m , n^2$ multiplications and $m , n^2$ addtions to only $m$ multiplications and $2 n + m$ additions.

With your actual n and m, we have

n = Sum[(2 L1 + 1), {L1, 0, 140}, {P1, 0, 280}]
m = (140 + 280)^2
Dataset@<|
  "First" -> <|"Multiplications" -> m n^2, "Additions" -> m n^2|>,
  "Second" -> <|"Multiplications" -> m, "Additions" -> 2 n + m|>
  |>

Edit 3:

The Clebsch-Gordan symbols

ClebschGordan[{P1, 0}, {L1, M1}, {Λ1, K1}]

vanishes if M1 != K1. That means that the product

Times[
 ClebschGordan[{P1, 0}, {L1, M1}, {Λ1, K1}], 
 ClebschGordan[{P2, 0}, {L2, M2}, {Λ2, K2}],
 KroneckerDelta[Λ1, Λ2], 
 KroneckerDelta[-K1, K2]
]

is nonvanishing only if M1 == K1 == -K2 == - M2 and Λ1 == Λ2. That gets you rid of four(!) summation variables.

All in all, this should give you plenty of ideas how to compute the sum significantly faster.

How to compute many sums or tables efficiently

One Answer

Add your own answers!

Ask a Question