TransWikia.com

Which feature to use in feature selection?

Data Science Asked by rmdcunha on March 22, 2021

Objective: Multiclass classification with supervised learning, small dataset (25h)

Context: My dataset is composed of mobile network data collected with a smartphone. The labels correspond to the activity of the user (Stationary, Walk, Subway, Train, Car). My features are calculated based on 3 fields: timestamp, ID, and signal strength (SS). All have different overlapping size windows: 15s, 30s, 602, 90s, 120s. So, I have 3 features based on ID and 16 statistical features based on SS for each window size with a total of 95 features.

My Question: Which feature selection should I use? Am I correct saying the features are not independent?

(I’m using python).

One Answer

My guess would be that the features are highly correlated. Check this. Regarding feature selection, I suggest starting with Logit and Lasso (L1 regulation). This method (Lasso, l1) can „shrink“ features to zero (so kick them out basically). This happens automatically based on feature importance.

Many methods (e.g. Boosting, Neural Nets) allow you to use L1 regularization. So there may be no need to „manually“ select features.

If you want to do that manually, you may use stepwise feature selection, as e.g. described in Introduction to Statistical Learning, Chapter 6.

Here is some Python code from the chapter: https://github.com/JWarmenhoven/ISLR-python

You may also have a quick look at this post related to Lasso: https://datascience.stackexchange.com/a/55702/71442

Answered by Peter on March 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP