Input a frequency distribution with unequal class widths and estimate the mean and median

Question

The websites I have visited and the texts available to me determine the mean of an unequal frequency distribution using
mean = Sum(class marks x frequency) / Sum(frequency).
This is an example I extracted from the internet:

The author computed the mean wickets as 152.889, which I recoded in Mathematica as
midpts = {40, 80, 125, 200, 300, 400};
freq = {7, 5, 16, 12, 2, 3};   
mean = midpts . freq /Total[freq] // N
median = 100.5 + ((Total[freq]/2 - (7 + 5))/50) 16

Is it a correct solution? How about the unequal class widths?

Sjoerd Smit · Answer

If it were up to me, I'd do something like this:
dist = MixtureDistribution[
   {7, 5, 16, 12, 2, 3},
   DiscreteUniformDistribution /@ {
     {21, 60}, {61, 100}, {101, 150}, 
     {151, 250}, {251, 350}, {351, 450}
   }
];
N @ Mean @ dist

153.389

This also generalizes easily for other statistics:
N @ StandardDeviation @ dist

94.3305

Quantile[dist, {0.1, 0.5, 0.9}]

{46, 133, 275}

It might be possible to use HistogramDistribution to do this, but I can't think of a way right now.
Edit and disclaimer
Anyone reading the comments on this answer will have noticed that the choice to use uniform distributions is only valid when you have absolutely no better way to assign probabilities within each class. In the case of cricket, it's quite likely that you can do better, since the very lowest and very highest values for "number of wickets" should be less likely than the ones in the middle (based on the general knowledge that the data should cluster roughly towards the middle). In general, I'd say that this is not a straightforward problem and for real applications you should probably ask someone with good knowledge of statistics how to deal with this exactly.

Autumn Lesoch · Answer

The example in question is of the form common in introductory college statistics. The raw data is assumed lost else there would be no ambiguity. The formula usually used to estimate the mean is :
mean = Sum(class marks x frequency) / Sum(frequency)
without regards to uniformity of the class widths, like this:
midpts = {40, 80, 125, 200, 300, 400};
freq = {7, 5, 16, 12, 2, 3};
Total[freq]/2 // N
mean = midpts . freq /Total[freq] // N

152.889

The website where I extracted the example gave the same answer. If we had use DiscreteUniformDistribution, the answer is almost similar.
If the class widths are the same, the estimated answers are consistent and the formula is applicable. No problem here.
I'm uncertain if the class widths are not uniform. Do we need to introduce any weighting to correct for the unequal distribution?
My approach is this:
(1) Divide the class widths into a uniform value of 1, then the solution would be
midpts = Range[21, 450];
freq = Join[Table[7, {40}], Table[5, {40}], Table[16, {50}], 
   Table[12, {100}], Table[2, {100}], Table[3, {100}]];   
mean = midpts . freq /Total[freq] // N

184.124

(2) Use a probability distribution via Piecewise definition:
f[x_] := Piecewise[{{7, 21 <= x <= 60},
                {5, 61 <= x <= 100},
                {16, 101 <= x <= 150},
                {12, 151 <= x <= 250},
                {2, 251 <= x <= 350},               
                {3, 351 <= x <= 450}}, 0]
dist = ProbabilityDistribution[f[x], {x, 21, 450, 1}, Method -> "Normalize"];
Mean[dist] // N

184.124

I'm unsure if my approach is correct and is hoping someone would guide me to the correct solution or help to give me some references (website or text).
Thanks in anticipation.

Input a frequency distribution with unequal class widths and estimate the mean and median

2 Answers

Edit and disclaimer

Add your own answers!

Ask a Question