Data Science Asked by kjl on December 19, 2020
I was using the r.quantile method in sagemath to find boundaries for a box plot.
The plot was taking a long time using r.quantile.
r.quantile took more than 20 seconds to find the quartiles for a data set that could be sorted and plotted point by point in less than half a second on the same machine.
What is a faster alternative?
The following (crude) code is at least 50 times faster than r.quantile:
def findFences (orderedList, outlierC = 1.5, farOutlierC = 3.0):
"""findFences: an ordered list of ints or floats -> tuple of 9 floats
(aMin, outerLoFence, innerLoFence, Q1, Q2, Q3, innerHiFence, outerHiFence, aMax)
keys: float for outlier constant [outlierC] and far outlier constant [farOutlierC]"""
lenMod4, half, quarter = len(orderedList) % 4, int(len(orderedList)/2), int(len(orderedList)/4)
aMin, aMax = orderedList[0], orderedList[-1]
# find quartiles
if not lenMod4: Q1, Q2, Q3 = (orderedList[half-quarter] + orderedList[half-quarter-1])/2.0, (orderedList[half] + orderedList[half-1])/2.0, (orderedList[half+quarter] + orderedList[half+quarter-1])/2.0
elif lenMod4 == 1: Q1, Q2, Q3 = float(orderedList[half-quarter]), float(orderedList[half]), float(orderedList[half+quarter])
elif lenMod4 == 2: Q1, Q2, Q3 = float(orderedList[half-quarter-1]), (orderedList[half] + orderedList[half-1])/2.0, float(orderedList[half+quarter])
else: Q1, Q2, Q3 = (orderedList[half-quarter] + orderedList[half-quarter-1])/2.0, float(orderedList[half]), (orderedList[half+quarter] + orderedList[half+quarter+1])/2.0
IQR = Q3 - Q1
outDist = IQR * outlierC
farOutDist = IQR * farOutlierC
innerLoFence, innerHiFence = Q1 - outDist, Q3 + outDist
outerLoFence, outerHiFence = Q1 - farOutDist, Q3 + farOutDist
return aMin, outerLoFence, innerLoFence, Q1, Q2, Q3, innerHiFence, outerHiFence, aMax
Answered by kjl on December 19, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP