Feature selection is not that useful?

Question

I've been doing a few DataScience competitions now, and i'm noticing something quite odd and frustrating to me. Why is it frustrating? Because , in theory, when you read about datascience it's all about features, and the careful selection, extraction and engineering of those to extract the maximum information out of raw variables, and so far, throwing every variable as it is in the mix seems to work fine with the right encodings. Even removing a variable that has 80% nulls ( which in theory should be an overfitting contributor ) decreases the performance of the regression model slightly.

For a pratical case : I have long/lat for a pickup point and destination point. I did the logical task of calculating the distance ( all kinds of them ) from these points. And dropped the long/lat. Model performs way better when you include both ( coordinates & distance ) in the features list. Any explanations? And a general thought on my dilemma here with the real utility of feature selection/engineering/extraction

EDIT : could it be that the information we can get out of the coordinates is bigger than the distance? Is it just possible to extract features that are more beneficial to my model that plain long/lat?

Dan Scally · Answer

My experience is the same. I think in my case at least it's largely down to the algorithms I would generally use, all of which have the capacity to ignore features or down-weight them to insignificance where they're not particularly useful to the model. For example, a Random Forest will simply not select particular features to split against. A neural network will just weight features into having no effect on the output and so on. My experience is that algorithms which take every feature into account (like a vanilla linear regression model) generally suffer far more.

Additionally, in a "production" rather than competitive environment I found that feature selection became much more important. This is generally due to covariate shift - the distribution of values for certain features changes over time and, where that change is significant between your training dataset and the live predictions you're making day-to-day, this can really trash your model's outputs completely. This kind of problem seems to be administered out of the datasets used for competitions, so I never experienced it until starting to use ML at work.

Graph4Me Consultant · Answer

If you want to perform linear regression with feature selection, you can formulate the problem as a MIO and solve it to optimality.
Then you can check if its worth it to do the feature selection.

Feature selection is not that useful?

2 Answers

Add your own answers!

Ask a Question