I have trouble imagining realistic research questions that can be answered by a one way ANOVA. Two or more ways with interaction terms and perhaps mixed factors (within and between subjects), sure you will need to do an ANOVA among others, but one way? I speak about one-way ANOVAs with more than two groups. If there are only two groups, that’s a straightforward t-test or Wilcoxon variation.
A way to put my question is this: Are there really research questions where all you want to know is whether at least any one of those groups is different from any other group, but where you don’t care which ones they are?
Another way to put my question is to state that the heavy lifting is not done by the ANOVA but by the planned contrasts or exhaustive post-hoc tests you do along with it. But do those really rely on the ANOVA? Being variants of t-tests, they only require i.i.d. data from interval variables that needs to be normally distributed unless $n$ is large enough. No ANOVA needed so far. Since you will be doing more than one comparison, you should control your type I error rate. If you’re doing all pairwise comparisons and control for the FWER, the consensus seems to be that you don’t need the ANOVA (and that you should do a power analysis instead of only taking 30 sample points per group for the CLT). Yet in this frequent scenario, I often see the ANOVA done anyway. Is this just a historic relic? I think this addition can be harmful as it reduces the power of the overall procedure by requiring the ANOVA must be significant before doing the post-hoc tests (discussed in more detail below).
Then there are the planned contrasts where you don’t compare all pairs but perhaps only some pre-selected ones and/or some linear combinations of groups (are those two treatments on average better than the average of the other three treatments etc.). It is alleged for example by (Howard Seltman, p.325) that those require to first reject the null hypothesis of the ANOVA:
The same kind of argument applies to looking at your planned comparisons
without first “screening” with the overall p-value of the ANOVA. Screening protects your Type 1 experiment-wise error rate, while lack of screening raises it.
The scare-quotes around “screening” seem telling to me. Sure, if you add one more hurdle, your type I error cannot go up for it. All other things being equal, it can at most stay the same (the tests are completely redundant) or decrease (the tests measure something slightly different). But if your only concern was type I error rate, why not just decrease the $alpha$ level? That seems certainly a cleaner solution than to add some other test that doesn’t really address your research question. (I also don’t agree with the implicit logic that type I error rate should always be your foremost concern, at least not unless you also do a power analysis.)
Am I overlooking something? Is the condition of the significant ANOVA result perhaps only instated because for pre-selected contrasts, no correction for multiple comparisons is done at all? If yes, how do we know that this is not a case where we should correct for FWER? (Family size would be determined by the number of pre-selected contrasts, if we really select them upfront) How would we know that instead, the prerequisite ANOVA is the correct way to deal with multiple pre-selected contrasts for each of which we leave $alpha$ at the nominal level?
Edit: And since some of the more complex ANOVA designs can be equivalent to a one-way design with more groups (that cover all the combinations among the two- or more way ANOVA), perhaps my question is even more general and applies to many other ANOVAs. I don’t want to overstate my case though. I’m not sure which ones can be equivalent to a one-way design and which ones cannot.
An interesting historical use-case is RA Fisher's explanation of ANOVA in the chapter 'Intraclass Correlations and the Analysis of Variance' in 'Statistical Methods for Research Workers' (there are several online versions which can be found for instance via the Wikipedia article). There he introduces ANOVA with intraclass correlation as a special case (which is an example of one-way ANOVA).
(An earlier example of ANOVA occured in 1923, which compared yield from different type potatoes and manure, and is an example of two-way ANOVA, but the one-way ANOVA relating to intraclass correlation is just as much a realistic research question / use case. I actually do not see why two-way versus one-way makes a difference in suitability of the application of ANOVA)
The question answered by ANOVA is just this: Where does variation come from? Is it due to intra-class variation or between class variation? One-way, two-way, multi-way. The dimension does not matter for such question. The question answered by ANOVA, in whatever dimension, is just whether samples of the same class correlate or not, whether the class plays a role in the variation.
Another way to put my question is to state that the heavy lifting is not done by the ANOVA but by the planned contrasts or exhaustive post-hoc tests you do along with it. But do those really rely on the ANOVA?
Many research question do not care about specific contrasts. For instance, in Fisher's example the research question could be: does the height of siblings correlate? And you do not care about a specific group of siblings to be different, and look at the overall variance between the groups (is the variance between groups just random variance of individuals, or is there some principle that causes variation between groups?)
Yet in this frequent scenario, I often see the ANOVA done anyway. Is this just a historic relic?
ANOVA can sometimes be done before individual tests of contrasts as a control in the multiple comparisons problem.
Methods which rely on an omnibus test before proceeding to multiple comparisons. Typically these methods require a significant ANOVA, MANOVA, or Tukey's range test. These methods generally provide only "weak" control of Type I error, except for certain numbers of hypotheses.
Answered by Sextus Empiricus on December 20, 2020
You pose the following question: Are there really research questions where all you want to know is whether at least any one of those groups is different from any other group, but where you don't care which ones they are?
Yes, here is one such example.
Research Question: Do students randomly assigned to different teaching assistants' recitation sections do comparably well on key course assessment indicators (say, the final exam)?
I think the issue with the way you've presented your query is that it seems to suggest only ANOVA with statistically significant results are possible for RQs. However, the RQ here is still reasonable (something someone might want to know), and it so happens that the hope is most likely NOT to find a statistically significant finding.
That said, if your query is specifically, are there other methods than 1-way ANOVA when you are expecting a difference? then I would agree...it might be harder to find an authentic RQ example.
To address the second query posed: Yet in this frequent scenario, I often see the ANOVA done anyway. Is this just a historic relic?
I would argue that starting with the planned comparisons without first confirming that a difference is actually present (i.e., an omnibus test) is a quasi-failure to confirm assumptions of the test. I would argue, thus, that it is not just an historic relic, but a process that should be encouraged (even in the more pedantic examples like an ANOVA). As a reviewer that has encountered more than one manuscript where subsequent significant findings (even with MCP adjustments) were reported when the MANOVA failed to detect a difference...I think there is something to be said in maintaining the omnibus protocol for 1-way ANOVA and subsequent MCPs.
Answered by Gregg H on December 20, 2020
Get help from others!