Cross Validated Asked by Mayank Kumar on December 25, 2021
I just came across a post that said in case of multicollinearity, apply PCA to the collinear features and fit the resulting variables. My question is, suppose I have 20 features and out of these 3 are multicollinear. Can I apply PCA to these 3 features which will result into say 2 features. Can I then use the total of 19 features (including the 2 PCA features) to fit my data. Is this the correct approach?
Suppose you have 20 predictors $X_1, X_2, ..., X_{20}$ and that $X_1, X_2, X_3$ is collinear.
From those three variables, PCA can be used to take the first two principal components, say $PC_1$ and $PC_2$.
Then you use $PC_1, PC_2, X_4, X_5, ..., X_{20}$ as your predictors.
As far as I know, there's nothing wrong in this approach. The information that the three features bring are still retained (to some degree).
However, you need to keep in mind that your model now doesn't exactly use the three original features, but rather, their two first principal components. If you need to interpret relationship between the response and one of the three predictors, you will need to transform back the principal components to $X_1, X_2, X_3$ to have a meaningful interpretation.
Answered by Nuclear03020704 on December 25, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP