TransWikia.com

How to interprete percentile information from the describe function in Pandas?

Data Science Asked on January 4, 2021

I am a bit stumped on how to interpret the percentile information you see when you call the describe function on dataframes in Pandas.

I believe I have a basic understanding of what percentile means. For example if in a test someones score 40% which ranks at the 75% percentile, this means that the score is higher than 75% of the total scores.

But I don’t know how to translate this knowledge to interpret what I see from the describe function.

To illustrate, given the following:

test = pd.DataFrame([1,2,3,4,5,1,1,1,1,9])
test.describe()

This prints out something similar to this:

| count | 10.000000 |
|-------|-----------|
| mean  | 2.800000  |
| std   | 2.616189  |
| min   | 1.000000  |
| 25%   | 1.000000  |
| 50%   | 1.500000  |
| 75%   | 3.750000  |
| max   | 9.000000  |

Now I do not know how to interpret the values assigned to 25%, 50% and 75%. For example 5 out of the 10 values is set to 1, but the 50% has a value of 1.50000, clearly it is not saying 1.5 has a value of 50% because there is not even 1.5 in the data set.

Also why is 25% set to 1.000000 and 75% set to 3.750000?

I know I am interpreting this wrong hence this question! Would appreciate if someone can help understand this

3 Answers

Pandas' describe function internally uses the quantile function. The interpolation parameter of the quantile function determines how the quantile is estimated. The output below shows how you can get 3.75 or 3.5 as the 0.75 quantile based on the interpolation used. linear is the default setting. Please take a look at Pandas' quantile function source code here 1

test = pd.Series([1,2,3,4,5,1,1,1,1,9])
test_series = test[0]

quantile_linear = test.quantile(0.75, interpolation='linear')
print(f'quantile based on linear interpolation: {quantile_linear}')

quantile based on linear interpolation: 3.75

quantile_midpoint = test.quantile(0.75, interpolation='midpoint')
print(f'quantile based on midpoint interpolation: {quantile_midpoint}')

quantile based on midpoint interpolation: 3.5

Correct answer by cmn on January 4, 2021

Percentiles indicate the percentage of scores that fall below a particular value. They tell you where a score stands relative to other scores.

For example: a person height 215 cm is at the 91st percentile, which indicates that his hight is higher than 91 percent of other scores.

Percentiles are a great tool to use when you need to know the position of a value/score respect to a population/data distribution you're considering. Where does a value fall within a distribution of values? While the concept behind percentiles is straight forward, there are different mathematical methods for calculating them.

In your example 50% correspond to the median of the ordered values distribution. In this case the median is calculated between two values: 1 and 2 so the median is calculated (in this case 'cause the number of values is even so the median as to be calculated between the fifth and sixth ordered values ) as the mean between them 1.5.

Answered by pellerossa pelles on January 4, 2021

Since you have 10 elements (which is even), you have a little tricky thing :

If you want the 50% (= the median), you have to take the mean between 5th and 6th element (starting at 1), so you have 5 elements in both side :

E E E E E1 | E2 E E E E

Which leads for you at

1 1 1 1 1 | 2 3 4 5 9

In your case, E1 = 1 and E2 = 2 (since it's sorted because you want median and quartiles), so this results as Median = 1.5

25% is easily understandable, first 5 values of your sorted df are "1", so if you make a cut in the first quarter, you'll find a 1

I still have an issue with the 75%... To me, if you cut it right, the value taken by the 75% is E3 :

E E E E E E | E3 | E E E

Which leads to

1 1 1 1 1 2 | 3 | 4 5 9

Which makes 75% = 3 and not 3.75 I can't figure why it's 3.75

Answered by BeamsAdept on January 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP