Mathematica Asked by Autumn Lesoch on March 8, 2021
The websites I have visited and the texts available to me determine the mean of an unequal frequency distribution using
mean = Sum(class marks x frequency) / Sum(frequency).
This is an example I extracted from the internet:
The author computed the mean wickets as 152.889, which I recoded in Mathematica as
midpts = {40, 80, 125, 200, 300, 400};
freq = {7, 5, 16, 12, 2, 3};
mean = midpts . freq /Total[freq] // N
median = 100.5 + ((Total[freq]/2 - (7 + 5))/50) 16
Is it a correct solution? How about the unequal class widths?
If it were up to me, I'd do something like this:
dist = MixtureDistribution[
{7, 5, 16, 12, 2, 3},
DiscreteUniformDistribution /@ {
{21, 60}, {61, 100}, {101, 150},
{151, 250}, {251, 350}, {351, 450}
}
];
N @ Mean @ dist
153.389
This also generalizes easily for other statistics:
N @ StandardDeviation @ dist
94.3305
Quantile[dist, {0.1, 0.5, 0.9}]
{46, 133, 275}
It might be possible to use HistogramDistribution
to do this, but I can't think of a way right now.
Anyone reading the comments on this answer will have noticed that the choice to use uniform distributions is only valid when you have absolutely no better way to assign probabilities within each class. In the case of cricket, it's quite likely that you can do better, since the very lowest and very highest values for "number of wickets" should be less likely than the ones in the middle (based on the general knowledge that the data should cluster roughly towards the middle). In general, I'd say that this is not a straightforward problem and for real applications you should probably ask someone with good knowledge of statistics how to deal with this exactly.
Answered by Sjoerd Smit on March 8, 2021
The example in question is of the form common in introductory college statistics. The raw data is assumed lost else there would be no ambiguity. The formula usually used to estimate the mean is :
mean = Sum(class marks x frequency) / Sum(frequency)
without regards to uniformity of the class widths, like this:
midpts = {40, 80, 125, 200, 300, 400};
freq = {7, 5, 16, 12, 2, 3};
Total[freq]/2 // N
mean = midpts . freq /Total[freq] // N
152.889
The website where I extracted the example gave the same answer. If we had use DiscreteUniformDistribution, the answer is almost similar.
If the class widths are the same, the estimated answers are consistent and the formula is applicable. No problem here.
I'm uncertain if the class widths are not uniform. Do we need to introduce any weighting to correct for the unequal distribution?
My approach is this:
(1) Divide the class widths into a uniform value of 1, then the solution would be
midpts = Range[21, 450];
freq = Join[Table[7, {40}], Table[5, {40}], Table[16, {50}],
Table[12, {100}], Table[2, {100}], Table[3, {100}]];
mean = midpts . freq /Total[freq] // N
184.124
(2) Use a probability distribution via Piecewise definition:
f[x_] := Piecewise[{{7, 21 <= x <= 60},
{5, 61 <= x <= 100},
{16, 101 <= x <= 150},
{12, 151 <= x <= 250},
{2, 251 <= x <= 350},
{3, 351 <= x <= 450}}, 0]
dist = ProbabilityDistribution[f[x], {x, 21, 450, 1}, Method -> "Normalize"];
Mean[dist] // N
184.124
I'm unsure if my approach is correct and is hoping someone would guide me to the correct solution or help to give me some references (website or text).
Thanks in anticipation.
Answered by Autumn Lesoch on March 8, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP