TransWikia.com

Bizarre behaviour - Span will speed up 1000-fold when passed input containing bad indices

Mathematica Asked by RobertNathaniel on December 28, 2020

(apologies for length but you can jump to a statement of the essential problem below)

UPDATE:
The data set is too large to paste. However, I have shown below how to generate pseudodata which reproduces the same delays in calculation.

pseudoTstampsRaw = 
  DateRange[{2011, 9, 25, 23, 50, 0.}, {2011, 9, 25 + 697, 0, 0, 0.}, 
   "Minute"];
pseudoTstamps = Drop[pseudoTstampsRaw, {1442, 1441 + 1440}];
pseudoTstamps = pseudoTstamps[[;; 1000000]];
pseudoData = RandomInteger[{0, 9}, {1000000, 5}];
ASXSPI1000000b = {pseudoTstamps, pseudoData} // Transpose;

The necessary context to my question :

I have 1 million minutes of Australian financial data in Zulu time.
Each 1 minute record in the data is of the format;
{{Y,M,D,h,m,s},{O,H,L,C,V}}
The first element is the usual timestamp in DateList format and the second element is a 5-vector of the actual data.

Objective: I wish to extract only that portion of the data which was generated during regular trading hours (RTH) in Sydney (09:50-16:30).

Solution:
Extract the timestamp.

tStamps6ASX = ASXSPI1000000b[[All, 1]]

Use this to generate a list of trading days over the period covered by the data.

tradingDaysASX = 
 DateList[#] & /@ 
  DayRange[tStamps6ASX[[1]], tStamps6ASX[[-1]], "BusinessDay",
    HolidayCalendar -> {"Australia", "ASX"}]

In[275]:= Length@tradingDaysASX

Out[275]= 1073

Using these dates, we can now construct lists of RTH opening and closing times for the market in question.

RTHstartSydney = 
 DateList@# & /@ (DateObject[
      Join[#[[;;3]], {9,50,0.} + {0, 1, 0.}], 
      TimeZone -> "Australia/Sydney"] & /@ tradingDaysASX)

RTHendSydney = 
 DateList@# & /@ (DateObject[Join[#[[;;3]], {16,30,0.}], 
      TimeZone -> "Australia/Sydney"] & /@ tradingDaysASX)

SydneyRTH = {RTHstartSydney, RTHendSydney} // Transpose

This produces a nice list of opening and closing times for RTH over the period of the data but converted to Zulu time, e.g.;

In[282]:= SydneyRTH[[{1, -1}]]

Out[282]= {{{2011, 9, 25, 23, 51, 0.}, {2011, 9, 26, 6, 30, 
   0.}}, {{2015, 12, 17, 22, 51, 0.}, {2015, 12, 18, 5, 30, 0.}}}

Now let’s generate indices for each timestamp in our data set.

rulesTimestamp2Pos6ASX = First /@ PositionIndex@tStamps6ASX

Let’s look at the first few examples;

In[284]:= rulesTimestamp2Pos6ASX[[;; 5]]

Out[284]= <|{2011, 9, 25, 23, 50, 0.} -> 
  1, {2011, 9, 25, 23, 51, 0.} -> 2, {2011, 9, 25, 23, 52, 0.} -> 
  3, {2011, 9, 25, 23, 53, 0.} -> 4, {2011, 9, 25, 23, 54, 0.} -> 5|>

Apply the rules to our set of Zulu RTH timestamps to convert them into Zulu RTH indices.

iSydneyRTH = SydneyRTH /. rulesTimestamp2Pos6ASX

The final step: map the data to each pair of RTH indices to generate a list of subsets of the data. Each element of the list will be the RTH data on that day.

In[345]:= Off@Part::span

In[350]:= AbsoluteTiming[
 lstRTH6ASXdirty = ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTH;]

Out[350]= {0.0132997, Null}

Notice that I switched off an error message. This is because the list of trading days that I supplied contains some dates for which no match could be found in the data. As a result, there was no matching rule to convert that date to an index and the ‘index’, instead of being an integer as expected, ia a raw timestamp. (In this particular case, the first such bad index pair is
{{2011, 10, 20, 22, 51, 0.},{2011, 10, 21, 5, 30, 0.}}.)

This can be quickly cleaned.

iSydneyRTHCleaned = Cases[iSydneyRTH, {_Integer, _Integer}]

The Problem:

In[331]:= AbsoluteTiming[
 lstRTH6ASX = 
   ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTHCleaned;]

Out[331]= {10.0362, Null}

In[288]:= Length@iSydneyRTHCleaned

Out[288]= 1061

The cleaned calculation, containing no errors in the supplied indices is 3 orders of magnitude slower!

However, I can confirm that both calculations still produce the same results once I’ve removed the bad results from the dirty calculation.

lstRTH6ASXdirtyCleaned = Select[lstRTH6ASXdirty, Length@# != 2 &]

In[344]:= lstRTH6ASX == lstRTH6ASXdirtyCleaned

Out[344]= True

Now watch what happens if I restrict the calculation to just around the first 100 pairs of indices

In[353]:= AbsoluteTiming[
 ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ 
    iSydneyRTHCleaned[[;; 100]];]

Out[353]= {10.4128, Null}

In[354]:= AbsoluteTiming[
 ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ 
    iSydneyRTHCleaned[[;; 99]];]

Out[354]= {0.00150393, Null}

UPDATE: On@"Packing" generates no messages for the above two expressions.

For some reason, 100 is a magic Span number. If I shift the start and end indices, the code remains slow if the span is 100 or greater but fast otherwise.

Thanks for reading this far and I would be grateful for your input.

One Answer

The issue is that at 100, Map automatically compiles the function, and your function is quite large.

AbsoluteTiming[ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTHCleaned[[;; 100]];]

SetSystemOptions["CompileOptions"->"MapCompileLength"->101];
AbsoluteTiming[t1 = ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTHCleaned[[;; 100]];]

{9.37285, Null}

{0.00075, Null}

An alternative is to use Table:

AbsoluteTiming[t2 = Table[ASXSPI1000000b[[i]], {i, Span @@@ iSydneyRTHCleaned[[;;100]]}];]

t1 === t2

{0.002355, Null}

True

Correct answer by Carl Woll on December 28, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP