Mathematica Asked by RobertNathaniel on December 28, 2020
(apologies for length but you can jump to a statement of the essential problem below)
UPDATE:
The data set is too large to paste. However, I have shown below how to generate pseudodata which reproduces the same delays in calculation.
pseudoTstampsRaw =
DateRange[{2011, 9, 25, 23, 50, 0.}, {2011, 9, 25 + 697, 0, 0, 0.},
"Minute"];
pseudoTstamps = Drop[pseudoTstampsRaw, {1442, 1441 + 1440}];
pseudoTstamps = pseudoTstamps[[;; 1000000]];
pseudoData = RandomInteger[{0, 9}, {1000000, 5}];
ASXSPI1000000b = {pseudoTstamps, pseudoData} // Transpose;
The necessary context to my question :
I have 1 million minutes of Australian financial data in Zulu time.
Each 1 minute record in the data is of the format;
{{Y,M,D,h,m,s},{O,H,L,C,V}}
The first element is the usual timestamp in DateList
format and the second element is a 5-vector of the actual data.
Objective: I wish to extract only that portion of the data which was generated during regular trading hours (RTH) in Sydney (09:50-16:30).
Solution:
Extract the timestamp.
tStamps6ASX = ASXSPI1000000b[[All, 1]]
Use this to generate a list of trading days over the period covered by the data.
tradingDaysASX =
DateList[#] & /@
DayRange[tStamps6ASX[[1]], tStamps6ASX[[-1]], "BusinessDay",
HolidayCalendar -> {"Australia", "ASX"}]
In[275]:= Length@tradingDaysASX
Out[275]= 1073
Using these dates, we can now construct lists of RTH opening and closing times for the market in question.
RTHstartSydney =
DateList@# & /@ (DateObject[
Join[#[[;;3]], {9,50,0.} + {0, 1, 0.}],
TimeZone -> "Australia/Sydney"] & /@ tradingDaysASX)
RTHendSydney =
DateList@# & /@ (DateObject[Join[#[[;;3]], {16,30,0.}],
TimeZone -> "Australia/Sydney"] & /@ tradingDaysASX)
SydneyRTH = {RTHstartSydney, RTHendSydney} // Transpose
This produces a nice list of opening and closing times for RTH over the period of the data but converted to Zulu time, e.g.;
In[282]:= SydneyRTH[[{1, -1}]]
Out[282]= {{{2011, 9, 25, 23, 51, 0.}, {2011, 9, 26, 6, 30,
0.}}, {{2015, 12, 17, 22, 51, 0.}, {2015, 12, 18, 5, 30, 0.}}}
Now let’s generate indices for each timestamp in our data set.
rulesTimestamp2Pos6ASX = First /@ PositionIndex@tStamps6ASX
Let’s look at the first few examples;
In[284]:= rulesTimestamp2Pos6ASX[[;; 5]]
Out[284]= <|{2011, 9, 25, 23, 50, 0.} ->
1, {2011, 9, 25, 23, 51, 0.} -> 2, {2011, 9, 25, 23, 52, 0.} ->
3, {2011, 9, 25, 23, 53, 0.} -> 4, {2011, 9, 25, 23, 54, 0.} -> 5|>
Apply the rules to our set of Zulu RTH timestamps to convert them into Zulu RTH indices.
iSydneyRTH = SydneyRTH /. rulesTimestamp2Pos6ASX
The final step: map the data to each pair of RTH indices to generate a list of subsets of the data. Each element of the list will be the RTH data on that day.
In[345]:= Off@Part::span
In[350]:= AbsoluteTiming[
lstRTH6ASXdirty = ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTH;]
Out[350]= {0.0132997, Null}
Notice that I switched off an error message. This is because the list of trading days that I supplied contains some dates for which no match could be found in the data. As a result, there was no matching rule to convert that date to an index and the ‘index’, instead of being an integer as expected, ia a raw timestamp. (In this particular case, the first such bad index pair is
{{2011, 10, 20, 22, 51, 0.},{2011, 10, 21, 5, 30, 0.}}
.)
This can be quickly cleaned.
iSydneyRTHCleaned = Cases[iSydneyRTH, {_Integer, _Integer}]
The Problem:
In[331]:= AbsoluteTiming[
lstRTH6ASX =
ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTHCleaned;]
Out[331]= {10.0362, Null}
In[288]:= Length@iSydneyRTHCleaned
Out[288]= 1061
The cleaned calculation, containing no errors in the supplied indices is 3 orders of magnitude slower!
However, I can confirm that both calculations still produce the same results once I’ve removed the bad results from the dirty calculation.
lstRTH6ASXdirtyCleaned = Select[lstRTH6ASXdirty, Length@# != 2 &]
In[344]:= lstRTH6ASX == lstRTH6ASXdirtyCleaned
Out[344]= True
Now watch what happens if I restrict the calculation to just around the first 100 pairs of indices
In[353]:= AbsoluteTiming[
ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@
iSydneyRTHCleaned[[;; 100]];]
Out[353]= {10.4128, Null}
In[354]:= AbsoluteTiming[
ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@
iSydneyRTHCleaned[[;; 99]];]
Out[354]= {0.00150393, Null}
UPDATE: On@"Packing"
generates no messages for the above two expressions.
For some reason, 100 is a magic Span
number. If I shift the start and end indices, the code remains slow if the span is 100 or greater but fast otherwise.
Thanks for reading this far and I would be grateful for your input.
The issue is that at 100, Map
automatically compiles the function, and your function is quite large.
AbsoluteTiming[ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTHCleaned[[;; 100]];]
SetSystemOptions["CompileOptions"->"MapCompileLength"->101];
AbsoluteTiming[t1 = ASXSPI1000000b[[#[[1]] ;; #[[2]]]] & /@ iSydneyRTHCleaned[[;; 100]];]
{9.37285, Null}
{0.00075, Null}
An alternative is to use Table
:
AbsoluteTiming[t2 = Table[ASXSPI1000000b[[i]], {i, Span @@@ iSydneyRTHCleaned[[;;100]]}];]
t1 === t2
{0.002355, Null}
True
Correct answer by Carl Woll on December 28, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP