How to select the fastest approach for large numerical data computations?

Question

I really love the flexibility of Mathematica: there are several ways to perform one task. However, to get the performance of the intense numeric calculation, it can cause some confuses. I wonder is it the real strength or the weakness of the language.
Example: Take a list of the first element in a matrix.
test1 = Transpose[{Range[10^8], Range[10^8]}];

The input list is Packed Array.
Developer`PackedArrayQ[test1]
True

For this simple task, there are many ways to do that. Now guess the performance of these commands:
(* test1 /. {a_, _} -> a; // Timing *) (* WARNING: May lock up your Mathematica! *)
First /@ test1; // Timing
test1[[All, 1]]; // Timing
Transpose[test1][[1]]; // Timing
First[Transpose[test1]]; // Timing
Take[Transpose[test1], 1]; // Timing

I think that, "Oh, the third one which uses only one function Part. This one should be the fastest". The rule of thumb, is:

Use lesser function will improve the speed
Treat the data as the whole
Use built-in function
Use packed array, etc
Avoid using Patterns for numerical calculation

So test1[[All, 1]] should be the fastest. But no, I'm wrong.

Timing results:
The slowest solution is:
test1 /. {a_, _} -> a; // Timing

Don't run this, because Mathematica will be stuck. (I need to Abort the Evaluation). It's obvious because Pattern searching and replacement are expensive. Luckily I didn't often use this type of programming.
The next slow solution is:
First /@ test1; // Timing

{2.90625, Null}

Surprisingly, Part is the next slow solution. I wonder why? This is the only case that uses one function Part.
test1[[All, 1]]; // Timing
{1.21875, Null}

And the combinations of 2 functions approaches are faster. Transpose and then Part, First and Transpose, Take and Transpose.
Transpose[test1][[1]]; // Timing
First[Transpose[test1]]; // Timing
Take[Transpose[test1], 1]; // Timing

{0.765625, Null}

{0.734375, Null}

{0.609375, Null}

The main question here is, there are too many approaches to perform the same operation. And normally, I didn't know which approach is the most optimal way in terms of efficiency.

Szabolcs · Accepted Answer

The main question here is, there are too many approaches to perform the same operation. And normally, I didn't know which approach is the most optimal way in terms of efficiency.

Mathematica's performance is hard to predict, even more so than that of other high-level languages. There is no simple guideline you can follow. There will always be surprises and the behaviour will change from one version to the next.

Some insight into why Transpose is faster here:
On my machine (macOS  / M12.1) Timing reports the lowest numbers for Part, not for Transpose. However, RepeatedTiming (which is based on AbsoluteTiming) reports a lower number for Transpose.
In[16]:= test1[[All, 1]]; // Timing
Out[16]= {1.32521, Null}

In[17]:= test1[[All, 1]]; // RepeatedTiming
Out[17]= {1.41, Null}

In[18]:= First[Transpose[test1]]; // Timing
Out[18]= {2.08334, Null}

In[19]:= First[Transpose[test1]]; // RepeatedTiming
Out[19]= {0.80, Null}

Typically, this is an indication that some operations are done in parallel. Timing measures the total time spent by each CPU core, while AbsoluteTiming measures wall time.
A quick look at the CPU monitor confirms that indeed, Part is single threaded (I see 100%) while Transpose is multi-threaded (I see ~250%).
This explains the difference.

Nam Nguyen · Answer

This is another observation, that sometimes in Mathematica, combining 2 functions is faster than using 1 function.
Jon McLoone " 10 Tips for Writing Fast Mathematica Code " has proposed that "Using fewer function will speed up". But not all the case, I think.
Do a simple test: Using a function inside a Table to generate list.
In[11]:= a1 = Table[Power[i, 2], {i, 10^7}]; // AbsoluteTiming

Out[11]= {0.238681, Null}

Using Range first, and then put it in a functions .
In[12]:= a2 = Power[Range[10^7], 2]; // AbsoluteTiming

Out[12]= {0.0703124, Null}

Both are PackedArray.
In[16]:= Developer`PackedArrayQ /@ {a1, a2}

Out[16]= {True, True}

Maybe, Part, and Table are the big function? So they need to check something before doing the computational code? And Range, and Transpose is faster, because they are just doing one simple thing with less overhead?
Conclusions

Don't use Table[f,{i,iMax}]
But use f[Range[iMax]]

here is the performance proof:
testTable[n_] := AbsoluteTiming[Table[Power[i, 2], {i, 10^n}];]
testRange[n_] := AbsoluteTiming[Power[Range[10^n]];]

nList = {4, 5, 6, 7, 8};

t1 = First@testTable[#] & /@ nList;
t2 = First@testRange[#] & /@ nList;

ListLinePlot[{Transpose[{nList, t1}], Transpose[{nList, t2}]}, 
 PlotLegends -> {"Table", "Range"}, Mesh -> All]

How to select the fastest approach for large numerical data computations?

2 Answers

Add your own answers!

Ask a Question