Cross Validated Asked on January 7, 2022
I calculated PCs for my samples and I am showing here data frame that has samples as my rows and PCs as my columns. My question is in order to decide on the number of PCs to keep for my regression analysis is this valid approach?
> head(a)
PC1 PC2 PC3 PC4 PC5 PC6 PC7
1 -13.0692 3.825460 -2.8089500 -0.120865 -9.53690 2.2582600 0.975514
2 -13.0419 4.076040 -2.3597900 2.326170 -0.73101 -1.5689400 1.642810
3 -9.5570 4.270540 -0.9153700 -0.160893 -2.27807 -1.0854500 -0.551797
4 -11.4407 0.716765 -0.0932982 -1.229210 2.56851 -0.0708945 2.841000
5 -15.0062 6.971110 -2.9324700 -3.033660 -3.73211 1.8029200 0.712720
6 -13.8156 1.667130 -1.2647800 3.929120 4.12255 0.2541560 1.119040
PC8 PC9 PC10
1 -2.220460 1.15324 3.677270
2 -2.552010 -2.57720 0.111892
3 0.360637 0.30142 -1.288880
4 1.391550 -5.13552 -1.975630
5 1.937330 -1.83419 -1.462170
6 -0.637011 -3.15796 -1.238350
...
a.cov <- cov(a)
a.eigen <- eigen(a.cov)
PVE <- a.eigen$values / sum(a.eigen$values)
> PVE
[1] 0.49967626 0.22981763 0.07138644 0.04307668 0.03680999 0.02830493
[7] 0.02526709 0.02384502 0.02135397 0.02046199
So it seems that the first 4 PCs explain about 85% of my variance. Is this the valid way on how to go abotu deciding the number of PCs to keep?
Yes, typically this is a good way to select how many principal components to include in your model.
It could help to visualize the eigenvalues as well. Plot them from highest to lowest and find the point where the curve flattens out (so that later eigenvalues make less impact on the information content)
Answered by phil on January 7, 2022
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP