Fast approach to do Summation in Compile[]?

Question

My code does massive Summation and Matrix multiplication
Compile[] has boosted it distinctly. But I read some literatures related to my program, It seems there are approches to make it even faster. Maybe it can be improved from optimizing MMA language or algorithm itself.
My code is below.
MomentComputing = 
 Compile[{{Mmax, _Integer}, {k, _Integer}, {image, _Real, 
    2}, {xLegendreP, _Real, 2}, {yLegendreP, _Real, 2}}, 
  Block[{m, n, width, momentMatrix},
   width = Length[image];
   momentMatrix = Table[0., {Mmax + 1}, {Mmax + 1}];
   Do[
momentMatrix[[m + 1, n + 1]] = ((2. (m - n) + 1.) (2. n + 1.)/((k k)*width width)) xLegendreP[[
        m - n + 1]].image.yLegendreP[[n + 1]], {m, 0, Mmax}, {n, 0, 
     m}];
   
   momentMatrix], CompilationTarget -> "C", 
  RuntimeAttributes -> {Listable}, Parallelization -> True,  
  RuntimeOptions -> "Speed"]

It should be better if I don't use any loop operations. But I can not figure out any other approaches. Probably matrix vector multiplication should be time-consuming as well.

Daniel Lichtblau · Accepted Answer

First we run the original with an example (which should have been in the post).
Mmax = 400;
W = 1024;
deltaX = .1;
SeedRandom[1111];
lambdaMatrix = RandomReal[1, {W, W}];
lPoly = Developer`ToPackedArray[RandomReal[{-10, 10}, {Mmax + 1, W}]];

AbsoluteTiming[
 res = MomentComputing[Mmax, 5, lambdaMatrix, lPoly, lPoly];]

(* Out[52]= {12.1522, Null} *)

What I had in mind is this (I could combine some steps and maybe improve speed a bit further).
MomentComputing2 = 
  Compile[{{Mmax, _Integer}, {k, _Integer}, {image, _Real, 
     2}, {xLegendreP, _Real, 2}, {yLegendreP, _Real, 2}}, 
   Block[{m, n, width, mult, mat1, momentMatrix},
    width = Length[image];
    mult = 1./(k^2*width^2);
    mat1 = 
     Table[(2. (m - n) + 1.) (2. n + 1.), {m, 0, Mmax}, {n, 0, Mmax}]*
      mult;
    momentMatrix = Transpose[xLegendreP.image.Transpose[yLegendreP]];
    momentMatrix = 
     Transpose[
      MapIndexed[(PadLeft[Drop[#1, -(#2[[1]] - 1)], Mmax + 1, 0.]) &, 
       momentMatrix]];
    mat1*momentMatrix
    ], CompilationTarget -> "C", RuntimeAttributes -> {Listable}, 
   Parallelization -> True, RuntimeOptions -> "Speed"];

Repeat the example.
AbsoluteTiming[
 res2 = MomentComputing2[Mmax, 5, lambdaMatrix, lPoly, lPoly];]

(* Out[54]= {0.023139, Null} *)

Check that the results are within numeric fuzz:
In[57]:= Max[Abs[res - res2]]
(* Out[57]= 3.69482*10^-13 *)
Upshot: machine precision matrix products will beat iterated vector products by a wide margin. To the point where it did not matter that I discard nearly half the computed elements.

Fast approach to do Summation in Compile[]?

One Answer

Add your own answers!

Ask a Question