Cross Validated Asked by M. Smith on August 4, 2020
When conducting linear regression in R, I am trying to understand how certain p-values are calculated and what they represent. So far this is my understanding:
The p-values from summary() correspond to t-tests of the marginal impact of the variables in question, given all the other variables are already included. This uses Type III sum of squares.
The anova() function instead uses F-tests, which are sequential testing using the Type I sum of squares. For example, if we have the following output:
Analysis of Variance Table
Response: soma
Df Sum Sq Mean Sq F value Pr(>F)
ht2 1 0.071 0.0710 0.1289 0.72073
wt2 1 4.635 4.6349 8.4196 0.00504 **
ht9 1 3.779 3.7792 6.8651 0.01090 *
Residuals 66 36.333 0.5505
---
The p-values are testing the significance of ht2 in the presence of the intercept only, of wt2 in the presence of only the intercept and ht2, and of ht9 in the presence of the intercept, ht2, and wt2.
Is this understanding correct? And if it is, then why do the p-values change when we add additional variables? For example:
Analysis of Variance Table
Response: soma
Df Sum Sq Mean Sq F value Pr(>F)
ht2 1 0.0710 0.0710 0.2072 0.6504835
wt2 1 4.6349 4.6349 13.5353 0.0004772 ***
ht9 1 3.7792 3.7792 11.0363 0.0014695 **
wt9 1 14.0746 14.0746 41.1018 1.878e-08 ***
Residuals 65 22.2581 0.3424
---
Adding the wt9 variable decreased the p-value for ht2. But if this is just testing the significance of ht2 in the presence of nothing but the intercept, shouldn’t the p-value be identical?
Thanks in advance for any clarifications!
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP