Cross Validated Asked by Gintas_ on December 21, 2021
I’m taking an introductory statistics course. At first, the textbook talks about real limits in the context of continuous variables and frequency distribution table, that is all clear. But on what conditions are real limits used for calculating the z score? Only when we are working with approximation of binomial distribution, because categories of scores are discrete (e.g. coin flip, p=1/2, can’t get 2.5 heads in 4 tries)? In such case when we look for p(X > 3), we take X = 2.5, get deviation from mean, divide by std. dev. to convert to z score.
2nd contextual question. E.g. we have data of income per household. When we look for p(X > 50000), we take X = 50000, find z score of 50000, and proportion in tail from the unit normal table is the answer (e.g. if mean was 40k, std. deviation is 10k, then z = (50k - 40k) / 10k = 1
. What about when we look for p(X >= 50000), do we find z score from X= 49999.5 (e.g. z = (49999.5 - 40k) / 10k = 0.99995
)?
EDIT:
Definition of real limits from the textbook (tl;dr continous (not discrete) variable has possible values in between, e.g. 1 and 2 can have 1.3, 1.2 in between, so measurement of score e.g. X = 2 represents an interval between 1.5 – 2.5):
Suppose $X sim mathsf{Binom}(n=75,, p=0.2)$ and you want to find $P(X le 12).$ You have several choices.
Binomial formula: Use the formula for the binomial PDF (or PMF) to evaluate each of 13 individual terms. A somewhat tedious task. $$P(X le 12) = P(X = 0) + P(X = 1) + cdots + P(X = 12) = sum_{k=0}^{12} {75 choose k}(.2)^k(.8)^{75-k}.$$
Tables: Some books have tables for binomial probabilities. Your textbook may have a few such tables in an appendix for various (smallish) values of $n$ and a few values of $p.$ Typically, $n = 75$ and $p=0.2$ will not be found there. Before the computer era, there were entire books of binomial probabilities, but those have mainly disappeared now.
Software: Maybe you have a statistical calculator that will do such
computations. But nowadays there is good software that will do the job easily. One
type of software that will handle this problem is R. In R, the function
pbinom
is a binomial CDF function. The CDF consists of probabilities of
the form $P(X le k).$ The R code below shows that, for $X sim mathsf{Binom}(n=75,, p=0.2),$ we have $P(X le 12) = 0.2397,$ correct to four places,
with more places of accuracy available if needed.
pbinom(12, 75, .2)
[1] 0.2396826
Normal approximation: For $n$ sufficiently large and $p$ not too near $0$ or $1,$ the distribution $mathsf{Binom}(n,p)$ can be approximated by using a normal distribution with matching mean and variance: $mathsf{Norm}(mu, sigma),$ where $mu = np$ and $sigma = sqrt{np(1-p)}.$
An often-useful rule of thumb is that the normal approximation gives about two-place accuracy if both $np > 5$ and $n(1-p) > 5.$ Both are satisfied for $n=75$ and $p = 0.2.$ So we find $mu = np = 15$ and $sigma = sqrt{np(1-p)} = 3.4641.$
Because the binomial distribution is discrete and the normal distribution is continuous, we have to be a little careful in order to get best results from the normal approximation. For $mathsf{Binom}(75,,0.2)$, we have $$P(X le 12) = P(X < 12.5) = P(X < 13).$$ But for $mathsf{Norm}(mu = 15,, sigma=0.6928),$ these are three different probabilities. Briefly put, the 'continuity correction' uses the second of the three because it (usually) gives the best approximation.
$$P(X le 12.5) = Pleft(frac{X-mu}{sigma} le frac{12.5-15}{3.461}right) approx P(Z le -0.7217 ) \ approx P(Z le -0.72) = 0.24,$$ where $Z sim mathsf{Binom}(0,1).$ The second approximation is necessary if you are using printed tables of the standard normal distribution because (without interpolation) these tables provide z-values to only two places.
pnorm(-0.7217); pnorm(-0.72)
[1] 0.2352395
[1] 0.2357625
In the figure below, the exact binomial probability $P(X le 12)$ is the sum of the heights of the vertical black bars to the left of the broken red line. The normal approximation is the area under the blue normal density curve to the left of the red line.
Notice that, according to the normal curve, the probability $P(X = 12)$ is represented by the area under the curve and above the interval $(11.5, 12.5].$ By using the normal approximation, we have included this entire probability--instead of just part of it.
Answered by BruceET on December 21, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP