How do feature selection on a sparse matrix?

Question

Say I want to do features selection on a sparse matrix, i.e., 10,000 rows x 1500 features, but the matrix is mostly sparse. Let's say the features are all numeric and the target is binary and discrete.
What's the correct and efficient way to apply feature selection? Moreover, I'm interested in applying mutual information on it.

SrJ · Answer

You can do a dimentionality Reduction as your matrix is Sparse. I would suggest to use PCA. PCA will reduce your 1500 input into k dimensional input of your choice with as much information retained as possible . Here k is a hyperparameter that you need to tune and fine the best one.
Another Approach is LASSO classfier which is a linear model with L1 regularization. This model will perform automatic feature selection and zero out the weight of feature that is not needed. But your input columns must be independent.

How do feature selection on a sparse matrix?

One Answer

Add your own answers!

Ask a Question