How do I determine which variables contribute to the 1st PC in PCA?

Question

Given the coefficients of PC1 as follows for each variable (0.30, 0.31, 0.42, 0.37, 0.13, -0.43, 0.29, -0.42, -0.11) which variables contributes most to this PC? Does the sign(+/-) matters or considering the absolute value is enough?

mnm · Accepted Answer

Welcome to the site. PCA is an unsupervised dimensionality reduction algorithm. It works by transforming the original feature-set into eigen-vectors that are difficult to map with the original feature set. As such, the first Principal Component (PC) contains the features with maximum variance. The subsequent PCs contain features with decreased variance to the first PC.
With this background, I invite you to read this Q on SO. It has the solution to programmatically determine the features deemed most important by PCA.
[edited]
Regarding the sign of the components, eve if you change them you do not change the variance that is contained in the first component. Moreover, when you change the signs, the weights (prcomp( ... )$rotation) also change the sign, so the interpretation stays exactly the same:
set.seed( 2020 )
df <- data.frame(1:10,rnorm(10))
pca1 <- prcomp( df )
pca2 <- princomp( df )
pca1$rotation

gives
                PC1        PC2
X1.10     0.9876877  0.1564384
rnorm.10. 0.1564384 -0.9876877

and pca2$loadigs gives,
               Comp.1 Comp.2
SS loadings       1.0    1.0
Proportion Var    0.5    0.5
Cumulative Var    0.5    1.0

Then the question arises that why the interpretation remains the same
You do the PCA regression of y on component 1. In the first version (prcomp), say the coefficient is positive: the larger the component 1, the larger the y. What does it mean when it comes to the original variables? Since the weight of the variable 1 (1:10 in df) is positive, that shows that the larger the variable 1, the larger the y.
Now use the second version (princomp). Since the component has the sign changed, the larger the y, the smaller the component 1 -- the coefficient of y< over PC1 is now negative. But so is the loading of the variable 1; that means, the larger variable 1, the smaller the component 1, the larger y -- the interpretation is the same.
The conclusion is that for each PCA component, the sign of its scores and of its loadings is arbitrary and meaningless. It can be flipped, but only if the sign of both scores and loadings is reversed at the same time.
Furthermore, the directions that the principal components act correspond to the eigenvectors of the system. If you are getting a positive or negative PC it just means that you are projecting on an eigenvector that is pointing in one direction or 180∘ away in the other direction. Regardless, the interpretation remains the same! It should also be added that the lengths of your principal components are simply the eigenvalues.

How do I determine which variables contribute to the 1st PC in PCA?

One Answer

Add your own answers!

Ask a Question