TransWikia.com

Input a frequency distribution with unequal class widths and estimate the mean and median

Mathematica Asked by Autumn Lesoch on March 8, 2021

The websites I have visited and the texts available to me determine the mean of an unequal frequency distribution using

mean = Sum(class marks x frequency) / Sum(frequency).

This is an example I extracted from the internet:

enter image description here

The author computed the mean wickets as 152.889, which I recoded in Mathematica as

midpts = {40, 80, 125, 200, 300, 400};
freq = {7, 5, 16, 12, 2, 3};   
mean = midpts . freq /Total[freq] // N
median = 100.5 + ((Total[freq]/2 - (7 + 5))/50) 16 

Is it a correct solution? How about the unequal class widths?

2 Answers

If it were up to me, I'd do something like this:

dist = MixtureDistribution[
   {7, 5, 16, 12, 2, 3},
   DiscreteUniformDistribution /@ {
     {21, 60}, {61, 100}, {101, 150}, 
     {151, 250}, {251, 350}, {351, 450}
   }
];
N @ Mean @ dist

153.389

This also generalizes easily for other statistics:

N @ StandardDeviation @ dist

94.3305

Quantile[dist, {0.1, 0.5, 0.9}]

{46, 133, 275}

It might be possible to use HistogramDistribution to do this, but I can't think of a way right now.

Edit and disclaimer

Anyone reading the comments on this answer will have noticed that the choice to use uniform distributions is only valid when you have absolutely no better way to assign probabilities within each class. In the case of cricket, it's quite likely that you can do better, since the very lowest and very highest values for "number of wickets" should be less likely than the ones in the middle (based on the general knowledge that the data should cluster roughly towards the middle). In general, I'd say that this is not a straightforward problem and for real applications you should probably ask someone with good knowledge of statistics how to deal with this exactly.

Answered by Sjoerd Smit on March 8, 2021

The example in question is of the form common in introductory college statistics. The raw data is assumed lost else there would be no ambiguity. The formula usually used to estimate the mean is :

mean = Sum(class marks x frequency) / Sum(frequency)

without regards to uniformity of the class widths, like this:

midpts = {40, 80, 125, 200, 300, 400};
freq = {7, 5, 16, 12, 2, 3};
Total[freq]/2 // N
mean = midpts . freq /Total[freq] // N

152.889

The website where I extracted the example gave the same answer. If we had use DiscreteUniformDistribution, the answer is almost similar.

If the class widths are the same, the estimated answers are consistent and the formula is applicable. No problem here.

I'm uncertain if the class widths are not uniform. Do we need to introduce any weighting to correct for the unequal distribution?

My approach is this:

(1) Divide the class widths into a uniform value of 1, then the solution would be

midpts = Range[21, 450];
freq = Join[Table[7, {40}], Table[5, {40}], Table[16, {50}], 
   Table[12, {100}], Table[2, {100}], Table[3, {100}]];   
mean = midpts . freq /Total[freq] // N

184.124

(2) Use a probability distribution via Piecewise definition:

f[x_] := Piecewise[{{7, 21 <= x <= 60},
                {5, 61 <= x <= 100},
                {16, 101 <= x <= 150},
                {12, 151 <= x <= 250},
                {2, 251 <= x <= 350},               
                {3, 351 <= x <= 450}}, 0]
dist = ProbabilityDistribution[f[x], {x, 21, 450, 1}, Method -> "Normalize"];
Mean[dist] // N

184.124

I'm unsure if my approach is correct and is hoping someone would guide me to the correct solution or help to give me some references (website or text).

Thanks in anticipation.

Answered by Autumn Lesoch on March 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP