Data Science Asked by Srishti M on January 10, 2021
I have seen researchers using pearson’s correlation coefficient to find out the relevant features — to keep the features that have a high correlation value with the target. The implication is that the correlated features contribute more information in finding out the target in classification problems. Whereas, we remove the features which are redundant and have very negligible correlation value.
Q1) Should highly correlated features with the target variable be included or removed from classification problems ? Is there a better/elegant explanation to this step?
Q2) How do we know that the dataset is linear when there are multiple variables involved? What does it mean by dataset being linear?
Q3) How to check for feature importance for non-linear case?
Q1) Should highly correlated features with the target variable be included or removed from classification and regression problems? Is there a better/elegant explanation to this step?
Actually there's no strong reason either to keep or remove features which have a low correlation with the target response, other than reducing the number of features if necessary:
However features which are highly correlated together (i.e. between features, not with the target response), should usually be removed because they are redundant and some algorithms don't deal very well with those. It's rarely done systematically though, because again this involves a lot of calculations.
Q2) How do we know that the dataset is linear when there are multiple variable involved? What does it mean by dataset being linear?
It's true that correlation measures are based on linearity assumptions, but that's rarely the main issue: as mentioned above it's used as an easy indicator of "amount of information" and it's known to be imperfect anyway, so the linearity assumption is not so crucial here.
A dataset would be linear if the response variable can be expressed as a linear equation of the features (i.e. in theory one would obtain near-perfect performance with a linear regression).
Q3) How to do feature importance for nonlinear case?
Information gain, KL divergence, and probably a few other measures. But using these to select features individually is also imperfect.
Correct answer by Erwan on January 10, 2021
for feature engineering there are different methods.
Pearson Correlation comes under Filter methods. Filter methods gives intuition on the high level. This can be the first step for feature engineering. In this process
the features having high correlation with target should be considered.
the features having high correlation among themselves should also be removed as, "they are acting two independent variables doing same work" then why keep both.
After considering the correlation approaches you can also dig in to the Wrapper based methods which are more robust for feature selection but that includes the burden of training process.
Refer this for introduction to the different approaches.
Answered by Desmond on January 10, 2021
Answered by Subhash C. Davar on January 10, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP