TransWikia.com

Dimensionality reduction and prediction when all columns have approximately same variance

Data Science Asked by Spiros Fot on February 19, 2021

I have a dataset of 25 columns where the goal is to predict the value of the 25th column based on the previous 24 columns.

The dataset is quite big that’s why I initially thought to proceed with PCA before doing any prediction. The problem is that PCA did not produce any good results in the sense that it outputs a 4% variance explained on each PC.
I suspect that this can be attributed to the fact that the variance of all columns is approximately the same (say from 90-91% for each column).

I am wondering what can be done in such a case to reduce dimensionality and what data science algorithms are most suited to address this problem.

I have already tried OLS, Random Forests, SVR and Gradient Boosting regression but their scores seem quite disappointing at the moment, letting aside the fact that the computational time is quite large.

One Answer

Concerning the dimensionality reduction

I encourage you to check non-linear dimensionality reduction if PCA does not give satisfying results. It can happen that some low dimensionality manifold hides behind a much higher number of features. The excellent sklearn's guide explains manifold learning in detail.

Concerning the algorithm

Almost any algorithm can manage 15 features with similar variance. It is always the case when you standardize your data before feeding the algorithm with it.

You might want to train any model on a sample of the data and check the results. It can highly reduce the computation time without hurting too much the performance. (there is a saturation of performance with respect to data for a given complexity (features + model's parameters)).

Feature Engineering importance

When performance seems to ceil when changing models, you should consider improving your feature engineering. (products of features, polynomial features, ...)

Correct answer by Rusoiba on February 19, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP