Data Science Asked on October 5, 2020
This was a question i saw in an interview for a data scientist position:
"Here is the following correlation heatmap that i got from my attributes. Regarding the correlation of each feature with the dependent variable (target/class
), it is noticeable that correlations are not very expressive.
Yet, i would like to know if can i expect good results from a classification model using this dataset. Also, what further investigation can i do (if i shouldn’t look after correlation only)?"
It's a general question, so there are more then a few things you can do.
Although, what stopping you to train a basic clssifier and investigate the results?
Some ideas:
Correct answer by Sahar Milis on October 5, 2020
The correlation does not effect your model using decision trees in a classification problem.
In the theory of decision tree models, you don`t need correlation or check of multicollinearity. Because the split in decision trees is made of entropy/information gain. The correlation does only check linear dependencies. The same is, when the dataset is highly correlated. You will get very good results with decision trees, because there you don´t need to delete correlated features or do dimension reduction (if you don´t have to).
It can be, that you don´t get very good results, when you use linear structured models like multiclass neural network, or multiclass logistic regression. There you will see that dimension reduction etc. can have a high influence on the accuracy in these models.
I had a similar question but with highly correlated features: decision -tree regression to avoid multicollinearity for regression model?
In your case I would say, if we use decision trees, it is not noticeable. However we should check this with the permutation importance of the features and check the polynomial dependencies. Of course you should ask the interviewer more question about his questions and the target of his question, to get more background information. This is very important in interviews.
Answered by martin on October 5, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP