# Question about DataCamp's hypothesis testing problem concerning p-values

Stack Overflow Asked by PhysicsPerson on December 30, 2020

I am taking a course in on online data science teaching website called DataCamp, and I had a question about one of their problems (which has to do with hypothesis testing). I will state the problem, their solution, and then my question about the solution. The statement of the problem is here:

The Civil Rights Act of 1964 was one of the most important pieces of legislation ever passed in the USA. Excluding "present" and "abstain" votes, 153 House Democrats and 136 Republicans voted yea. However, 91 Democrats and 35 Republicans voted nay. Did party affiliation make a difference in the vote?
To answer this question, you will evaluate the hypothesis that the party of a House member has no bearing on his or her vote. You will use the fraction of Democrats voting in favor as your test statistic and evaluate the probability of observing a fraction of Democrats voting in favor at least as small as the observed fraction of 153/244. (That’s right, at least as small as. In 1964, it was the Democrats who were less progressive on civil rights issues.) To do this, permute the party labels of the House voters and then arbitrarily divide them into "Democrats" and "Republicans" and compute the fraction of Democrats voting yea.

The correct python answer is here

# Construct arrays of data: dems, reps
dems = np.array([True] * 153 + [False] * 91)
reps = np.array([True] * 136 + [False] * 35)

def frac_yea_dems(dems, reps):
"""Compute fraction of Democrat yea votes."""
frac = np.sum(dems) / len(dems)
return frac

# Acquire permutation samples: perm_replicates
perm_replicates = draw_perm_reps(dems, reps, frac_yea_dems, 10000)

# Compute and print p-value: p
p = np.sum(perm_replicates <= 153/244) / len(perm_replicates)
print('p-value =', p)
plt.hist(perm_replicates,bins=50)
plt.show()



where


def draw_perm_reps(data_1, data_2, func, size=1):
"""Generate multiple permutation replicates."""

# Initialize array of replicates: perm_replicates
perm_replicates = np.empty(size)

for i in range(size):
# Generate permutation sample
perm_sample_1, perm_sample_2 = permutation_sample(data_1,data_2)

# Compute the test statistic
perm_replicates[i] = func(perm_sample_1,perm_sample_2)

return perm_replicates

def permutation_sample(data1, data2):
"""Generate a permutation sample from two data sets."""

# Concatenate the data sets: data
data = np.concatenate((data1,data2))

# Permute the concatenated array: permuted_data
permuted_data = np.random.permutation(data)

# Split the permuted array into two: perm_sample_1, perm_sample_2
perm_sample_1 = permuted_data[:len(data1)]
perm_sample_2 = permuted_data[len(data1):]

return perm_sample_1, perm_sample_2


When you run all of the above code, you get that p is really small, implying that the null hypothesis should be rejected.

What I do not understand is why they are summing perm_replicates less than or equal to 153/244. If the hypothesis they are testing is that the party of a house member has no bearing on his or her vote, shouldn’t they be doing like p = np.sum(153/244-delta <= perm_replicates <= 153/244+delta) / len(perm_replicates), where delta is some parameter determining the neighborhood around the empirical mean which determines our threshold for similarity between the means of the permutated data set means and the empirical mean?

Another way of saying it: if we replace reps with dems such that instead of perm_replicates = draw_perm_reps(dems, reps, frac_yea_dems, 10000), it’s perm_replicates = draw_perm_reps(dems, dems, frac_yea_dems, 10000), then the hypothesis will surely show that party affiliation does not make a difference in the vote (which is the null hypothesis). If you run this code, you will get a p value of 50% (which makes sense, the permutations all will give similar answers), but as the reps data set deviates from dems, you will either start increasing away from a 50% p value or decreasing away from a 50% p value, so p = np.sum(perm_replicates <= 153/244) / len(perm_replicates) being small (<<1) OR large (near 1) should cause you to reject the null hypothesis, and a p value near 1/2 should cause you to accept the null hypothesis, right???

There must be something very basic I’m missing here, maybe someone could try to answer my questions and qualms directly or you could just explain the answer independently and maybe I’ll understand your explanation to a point where my qualms become absurd to me. I appreciate any help though =D

Good thinking, you have just described the difference between a one and two sided test. To get more into it, each time you do a simulation, you get a simulated fraction of Democrats who vote yea under the hypothesis that party affiliation plays no role.

These simulated fractions build up a distribution, called the null distribution, and then we compare our given fraction of 153/244 to this distribution. We reject the null hypothesis if our given fraction is unusually small compared to the distribution of simulated ones. What I think you are saying is, shouldn't we also reject if the true fraction is unusually large compared to the null distribution? Wouldn't an unusually large fraction also be evidence that there is party-affiliation effect?

The answer is yes! Both an unusually small or large fraction could be evidence against the null hypothesis and you could set up the test to account for both. That is called two-sided test.

The question is where you want your power to be. Precisely: the alternative hypothesis is a big place. Under the alternative, party affiliation does play some role, but it could be in either direction. The DataCamp test is specifically designed to reject when Democrat is less likely to vote yea than a Republican. You could set up a test in the opposite direction that rejects if you have an unusually large fraction of Democrat yeas, and it would also be a valid test of this null hypothesis (that party doesn't make a difference). However, it wouldn't reject under this case, because the empirical fraction we saw is on the low side rather than the large side.

Your two-sided test would also be a valid test. But the most powerful test for cases where Democrats are in truth less likely to vote yea than Republicans would be DataCamp's test. And if you only care about rejecting the null hypothesis when it's Democrats who are less likely, rather than Republicans, then that is what you should use. But if you care about both possibilities, you should go for the two-sided test.

I assume DataCamp used a one-sided test because it's simpler to code, not because it's more appropriate to the situation.