When you do a random permutation F test (by permuting group membership) is inference made on the samples or the populations?

Question

I am trying to understand different approaches that use randomization procedures. One thing I cannot come to a clear conclusion on is this:

Say we have randomly sampled individuals from 2 populations. We were able to sample 10 individuals from the first population, and 24 individuals from the second population. For each individual we measured some continuous response variable, and we want to know if the groups have different concentrations of this variable. Let group represent the samples from our populations:

set.seed(123)
data<- data.frame(group = rep(c("G1","G2"),c(10,24)),
                  resp = c(
                  rnorm(10, 1.34, 0.17),
                  rnorm(24, 1.14, 0.11)))

Here is a boxplot of the raw values:

library(ggplot2)
  ggplot(data, aes(x=group, y=resp))+
  geom_boxplot(aes(col=group))

[![enter image description here][1]][1]

Lets do a regular anova so that we can check ourselves:

summary(aov(resp ~ group, data = data))
            Df Sum Sq Mean Sq F value   Pr(>F)    
group        1 0.3228  0.3228   20.76 7.18e-05 ***
Residuals   32 0.4976  0.0156

Now lets do a permutation F-test by permuting group membership 999 times. Note that I will use the jmuOutlier package to do this, just for demonstration. I THINK THAT….. this function is permuting group membership using the raw data, not residuals, but im not 100% sure (R packages with built in functions to do these tests seem to never specify what KIND of permutation they are doing, nor what EXACTLY they are permuting, if someone wants to clear up the reason for this for me, I would appreciate it). However, this specific question is about the logic behind permuting group membership with the raw data, not the residuals.

library(jmuOutlier)
set.seed(123)
perm.f.test(response=data$resp,treatment= data$group, num.sim = 999)
[[1]]
[1] "One-way ANOVA"

[[2]]
            Df  Sum Sq Mean Sq F value    Pr(>F)    
treatment    1 0.32282 0.32282  20.759 7.182e-05 ***
Residuals   32 0.49764 0.01555                      
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

[[3]]
[1] "The p-value from the permutation F-test based on"

[[4]]
[1] "999 simulations is 0"

Okay so the observed F-value occurred less than 95% of the time under the null model (p < 0.05). My question is regarding the interpretation of this procedure. From my understanding, we have evidence that the two groups have different variances. Assuming this is correct, what I am not sure of is if this is specifically saying that the two sample distributions have different variance, or that the distribution of the populations that the samples were generated from have different variances? What does this say about the sample, or the population means?

anova distributions hypothesis testing permutation test r

BruceET · Answer

I'm not sure why you're using an ANOVA to try to distinguish between means of two normal populations. Especially, when you seem to have different population variances.
Your data. You have simulated data as follows:
set.seed(123)
x1 = rnorm(10, 1.34, 0.17)
x2 = rnorm(24, 1.14, 0.11)

Data summaries and description. Now, in a real-world situation, you will not know population means and standard deviations. You may not even know for sure that the data are normal.
Data summaries of your observations are as follows:
summary(x1); sd(x1)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  1.125   1.250   1.326   1.353   1.404   1.632 
[1] 0.1621433  # SD x1
summary(x2); sd(x2)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.9237  1.0684  1.1545  1.1388  1.2209  1.3366 
[1] 0.1065314  # SD x2

stripchart(list(x1,x2), ylim=c(.6,2.4), pch="|")

From the stripchart it we see no marked skewness or far outliers, so it seems safe
to assume data are sufficiently close to normal that a t test is OK. Shapiro-Wilk tests show no evidence of non-normality.
shapiro.test(x1)$p.val
[1] 0.3890032
shapiro.test(x2)$p.val
[1] 0.906877

Welch 2-sample t test. We want to test $H_0: mu_1 = mu_2$ against $H_a: mu_1 ne mu_2$ at the 5% level.
Because tests of equal variances have poor power and because we have no reason to
believe that population variances are equal, we will use a Welch 2-sample t test,
which does not assume equal variances.
x = c(x1, x2);  gp = rep(1:2, c(10, 24))
t.test(x ~ gp)

Welch Two Sample t-test

data:  x by gp
t = 3.8397, df = 12.372, p-value = 0.002231
alternative hypothesis: 
  true difference in means is not equal to 0
95 percent confidence interval:
 0.0929087 0.3347989
sample estimates:
mean in group 1 mean in group 2 
       1.352686        1.138833

Thus, the Welch 2-sample t test finds a significant difference (P-value 0.0022) between sample
means $bar X_1 = 1.353$ and $bar X_2 = 1.239;$ we reject $H_0,$ concluding
that population means $mu_1$ and $mu_2$ differ. Notice that the DF of the t statistic
is about 12, which is noticeably smaller than the DF = 32 in a pooled t test, so
the Welch procedure has made a noticeable accommodation to heteroscedasticity of the
data.
Permutation test. If you were not asking about permutation tests, that could be the end of the
story. But suppose you have some doubts about the appropriateness of the Welch
t test. Then you could use the Welch t statistic as the 'metric' of a permutation
test. You would regard the Welch test as a valid way of measuring the difference
between  $bar X_1$ and $bar X_2,$ but you would not necessarily believe that
it has a t distribution under $H_0$ or that the P-value is correct.
At each iteration, a permutation randomly assigns the 10 + 24 = 34 observations
10 to group 1 and 24 to group 2, and then evaluates the Welch t statistic for
that iteration. Upon doing many iterations (say, $m = 10,000$ of them) you have
a good idea what the permutation distribution of Welch's t statistic is (even
though it may not be t distributed). You can use the observed t statistic t = 3.8397
from above to find the P-value according to the (approximate) simulation
distribution.
Here is R code for simulating the permutation distribution of the metric.
[In R, sample(gp) randomly permutes the elements of gp.]
t.obs = t.test(x~gp)$stat;  t.obs
       t 
3.839741 
set.seed(2020)
t.prm = replicate(10^4, t.test(x~sample(gp))$stat)
mean(abs(t.prm) > abs(t.obs)) # P-value of perm test
[1] 2e-04

The P-value of the permutation test is smaller than for the Welch test, again leading to rejection of $H_0.$
The P-value of the permutation test is an approximate result: There are ${34choose 10} = 131,128,140$ ways to permute the data, of which we have seen enough to get a useful approximation. [With a different seed and $m = 10^5$ iterations, the P-value was about $0.001.]$
I always like to take a look at the permutation distribution. In particular, it should
have very many uniquely different values. [Some metrics, especially ones using medians,
may not produce enough different values to give a useful permutation distribution.]
summary(t.prm)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
-3.98182 -0.74011 -0.01279 -0.05879  0.66292  3.89183 
length(unique(t.prm))
[1] 9998

The following histogram shows the permutation distribution; areas in the tails
outside the vertical lines sum to the P-value of the permutation test.
hist(t.prm, prob=T, col="skyblue2", main="Permutation Dist'n of Welch t")
 abline(v=c(t.obs, -t.obs), col="red")

Notes: With three or more groups, one would use an ANOVA to test
for significant differences among group means. With no good reason to believe
group variances are equal, I would probably use oneway.test in R, which
uses a Satterthwaite approximation similar to the one in Welch's t test to
avoid the assumption of equal variances. The F-statistic from that test could
be used as a metric for a permutation test.
Relevant code:
set.seed(123)
y1 = rnorm(10, 1.34, 0.17)
y2 = rnorm(24, 1.14, 0.11)
y3 = rnorm(15, 1.20, 0.15)
y = c(y1,y2,y3)
g = rep(1:3, c(10,24,15))
oneway.test(y~g)

One-way analysis of means 
      (not assuming equal variances)

data:  y and g
F = 7.6278, num df = 2.000, denom df = 19.924, 
 p-value = 0.003469

F.obs = oneway.test(y~g)$stat; F.obs
       F 
7.627752 
set.seed(1234)
F.prm = replicate(10^4, oneway.test(y~sample(g))$stat)
mean(F.prm >= F.obs)
[1] 0.0025

length(unique(F.prm))
[1] 10000

When you do a random permutation F test (by permuting group membership) is inference made on the samples or the populations?

One Answer

Add your own answers!

Ask a Question