Data Science Asked by Che Tou on May 19, 2021
Currently, I am confusing to PCA and regularisation.
I wonder what is the difference between PCA and regularisation: namely lasso (L1) regression?
Seems both of them can do the feature selection. Actually, I am not quiet familiarise the difference between dimensional reduction and feature selection.
Lasso does feature selection in the way that a penalty is added to the OLS loss function (see figure below). So you can say that features with low "impact" will be "shrunken" by the penalty term (you "regulate" the features). Because of the L1 penalty, the $beta_i$ can also become zero (which is not the case with Ridge, L2). In the Lasso case you would "eliminate" a feature when it is "shrunken" to zero, and you could call this feature selection. Lasso can be used in "high dimensions", i.e. when you have many features ("columns") but not so many observations ("rows").
Principle components work in quite a different way. The first principle component is a normalised linear combination [of the original features] which has the largest variance. So you kind of "transform" the original features to a principle component (which is a "new feature" derived from the original ones), where you try to capture as much variance as possible in one principle component.
Principle components are uncorrelated (orthogonal). This can be very helpful when you do linear regression, in which (high) correlation between features can be a real problem. I see PCA as a tool for dimensionality reduction (not so much feature selection), since you can express many features in a (smaller) number of principle components.
So maybe a little too brief summary:
For more details, refer to "Introduction to Statistical Learning" (available for free online). Ch. 6.2.2 covers the Lasso, Ch. 10.2.1 covers PCA.
Answered by Peter on May 19, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP