TransWikia.com

How to select correlation between object type data for modelling in Data Analysis?

Data Science Asked by Xayka Xayka on August 18, 2021

I am analyzing the data with Python. I am at the stage of finding correlation of columns with the target column (price in this example) after data cleaning, to use the columns which have correlations with the target column in the model to perform the prediction.

I have found the correlation between integer type columns through analysis of results of Pandas method .corr() and then calculated Pearson Coefficient together with P-values in order to confirm the correlations. Then I have used .describe method of the Pandas library for object types to have the results of the first part of an image (from cmd). In order to find the correlation between object type columns and a target column, I have implemented ANOVA analysis for all of the columns of object type and analyzed the F test values and P-values of each column.

As it can be seen from the picture, just 2 columns (drive-wheels, num-of-cylinders) have high F values which are 68 and 55 (low values have been deleted).

As I know, ANOVA can be implemented also to find which combinations of categories within the column can have a significant correlation, but I don’t know how to use this and don’t have a good understanding of why we do it and how it is used in building model for prediction. Could someone explain?

I also did 21 ANOVA analyses within num-of-cylinders column to reveal the correlation between categories which can correlate with the target column. Is it right to do? And what does it give us? I have printed the results that have a good correlation, but I don’t understand what they give us.

enter image description here

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP