TransWikia.com

How to perform feature selection with Categorical Variables and Continuous Target, provided that data is not normally distributed?

Data Science Asked by Ahmed Jyad on January 17, 2021

Basically I am trying a use Multi Linear Regression Model to predict the salaries of employees. I have a total of 88 dependent feature from which 19 are categorical and the rest are continuous. I have managed to reduced the number of continuous features from 69 to 41. Now I am trying to reduce the number of categorical feature, but since my data is not normally distributed I can’t use t-test or ANOVA. Which other tests can I use to see if the features are significant to predict the target?

One Answer

If I understand your question correctly, You are asking how to reduce categorical features from a dataset. If yes then few of the approach I can think of -

1> Iterative Process - Build a model with all numerical features and one categorical feature then evaluate your improvement of the model by whatever metrics you are using and then add other categorical features and so on. So if you have N cat features you will be building N+1 models.

2> chi square test of predictor and target variables.

3> (what I use) Build a model with all the available features and measure it's performance and then use the feature importance functionality of that model to determine which feature is important. In case of linear regression, Higher the value of coefficient better the feature. Alternatively, you can use L1 regularization to check for non zero features. Do check for multi-collinearity before considering feature importance in linear regression.

Answered by Akash Singh on January 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP