TransWikia.com

High correlation between the independent and dependent variables but low performance of regression model

Data Science Asked on August 8, 2021

I have a dataset of 4900 rows and 2060 feature. I calculated the correlation using kendall method between the dependent and independent features, and found out that 5 of these features are having a correlation with the output(dependent) variable very high correlation, the highest independent feature has a correlation of .836, and the fifth feature has correlation of .736. So I had high hopes that my regressor model will fit well.
I split the data to 80% training data and 20% testing data. However, I got overfitting and the training abs error itself is still not so good.

Is there any reason to have a bad fitting despite the very high correlation values? or is it because they are only 5 features out of 2066 features? or is it because the small number of the data rows?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP