Calculating the variance of dice rolls?

Question

I am having trouble understanding how to find the variance for the proportion of times we see a 6 when we roll a dice.  The question is below:
Suppose we are interested in the proportion of times we see a 6 when
rolling n=100 dice. This is a random variable which we can simulate with
x=sample(1:6, n, replace=TRUE)

and the proportion we are interested in can be expressed as an average:
mean(x==6)

Because the die rolls are independent, the CLT applies. We want to roll n dice 10,000 times and keep these proportions. This
random variable (proportion of 6s) has mean p=1/6 and variance p*(1-p)/n.  So according to the CLT, z = (mean(x==6) - p) / sqrt(p*(1-p)/n) should  be normal with mean 0 and SD 1.
So according to the problem, the mean proportion you should get is 1/6. I  can get how the proportion of 6's you get should average out to 1/6. The mean proportion is p = 1/6.
But the variance confuses me. The question says variance is p*(1-p)/n. But the formula for variance for a sample is the sum of the difference between a value and the mean divided by the sample size minus one.  Why do they do differently here?

BruceET · Answer

You are correct to say that your experiment to roll a fair die $n=100$ times can be simulated in R using:
set.seed(2020)
n = 100; x=sample(1:6, n, replace=TRUE)
sum(x);  mean(x);  var(x)
[1] 347
[1] 3.47
[1] 2.635455

For one roll of a fair die, the mean number rolled is
$$mu = E(X) = sum_{i=1}^6 iP(X=i) = sum_{i=1}^6 i(1/6) = 3.5,$$
x = 1:6;  pr=rep(1/6,6)
sum(x*pr)
[1] 3.5

The variance of the result is $Var(X) = E[(X_i - mu)^2] = E(X^2) - mu^2.$
$$E(X^2) = sum_{i=1}^6 i^2P(X = i) = sum_{i=1}^6 i^2(1/6) = 91/6 = 15.16667.$$
sum(x^2*pr)
[1] 15.16667

$$Var(X) = 91/6 - (7/2)^2 = 35/12 = 2.916667.$$
sum(x^2*pr) - 3.5^2
[1] 2.916667
sum((x-3.5)^2*pr)
[1] 2.916667

Then, for 100 rolls of the die, the total is $T = sum_{j=1}^{100} X_j$ with
$$E(T) = E(X_1 + X_2 +cdots + X_{100}) = 100(3.5) = 350.$$
and (by independence)
$$Var(T) = Var(X_1 + X_2 + cdots X_{100}) = 100(35/12) =  291.6667.$$
So we have $E(A) = E(bar X) = E(T/100) = E(T)/100 = 3.50.$ and
$Var(A) = Var(bar X) = Var(T/100) = frac{1}{100^2}Var(T) = 0.02916667.$
Also, $Var(A) = Var(bar X) = Var(X_j)/100 = 2.916667/100 = Var(T)/100^2 = 0.02916667.$
If we simulate a million 100-toss experiments, we can get a close approximation
of these theoretical results
set.seed(723)
m - 10^6;  n = 100
t = replicate(m, sum(sample(1:6, n, rep=T)))
mean(t)
[1] 349.995       # aprx E(T) = 350
var(t)
[1] 291.7679      # aprx Var(T) = 291.67
a = t/n
mean(a)
[1] 3.49995       # aprx E(A) = 3.5
var(a)
[1] 0.02917679    # aprx Var(A) = 0.029

gunes · Answer

But the variance confuses me. The question says variance is p*(1-p)/n.
But the formula for variance for a sample is the sum of the difference
between a value and the mean divided by the sample size minus one. Why
do they do differently here?

That is the sample variance, i.e.
$$hatsigma^2=frac{1}{n-1}sum_{i=1}^n (x_i-bar x)^2$$
For a random sample of $x_i$.

kurtosis · Answer

Let's call $x$ the number of 6's in $n$ die rolls. The theoretical variance for the number of 6's in $N$ die rolls is then $var(x|N=n)=np(1-p)$.
Now let's call $pi$ the proportion of die rolls which are 6's. Then $E(pi|N=n)=frac{x}{n}$. The variance for the proportion of 6's is $var(pi|N=n)=var(frac{x}{n}|N=n)=frac{1}{n^2}var(x|N=n)=frac{p(1-p)}{n}$.
That is fine for theoretical values; however, now let's say you want to gather some data (or simulate) and estimate $var(frac{x}{n}|N=n)$ from your data. In that case, you need to account for also estimating the mean. While you could assume the mean is 1/6, perhaps this die is biased and so $P(6)neq 1/6$.
Since you have to estimate the mean, you effectively use up one of your data points: if you gave me $n-1$ observations and the mean, I know the $n$-th observation. (Thus that $n$-th observation is not independent after using the estimated mean.) We say that the degrees of freedom is $n-1$. For this reason, when you estimate your sample variance you divide the sum of squared differences from the mean by $n-1$.

Calculating the variance of dice rolls?

3 Answers

Add your own answers!

Ask a Question