Data Science Asked by Imran.Ali.PhD on December 31, 2020
[I encountered this question at an interview few weeks ago and I am still not clear.]
If all the values in a categorical column fuel_mileage
come from the set {poor, good, very_good}
, then we can make the column ordinal due to universal and ordered relationship amongst {poor, good, very_good}
, so this is kind of obvious.
However, imagine the label column in this same dataset is engine_longevity
, so that we are studying all other variables in the context of their relationship with it. During data exploration, it turns out that another categorical column, manufacturer
, all of whose values come from set {H, S, J, K}
, has a very strong correlation with label engine_longevity
, so much so that the choice of H
, S
, J
, K
in a given sample essentially dictates the label. Therefore, as for as this data set is concerned, H
, S
, J
, K
have an ordered relationship with respect to label engine_longevity
. The question is:
manufacturer
ordinal? If yes, how strong should the relationship between manufacturer
and the label engine_longevity
be? And what metric will you use to measure it? manufacturer
column ordinal, why?If there is no hard-and-fast rule, I would like to know how the community here will approach this situation.
You are describing highly correlation features. The most common way to measure the correlation between two variables measured on at least an ordinal scale is the Spearman rank-order correlation coefficient.
Generally if two features are near perfectly correlated, one feature can be dropped from analysis.
Answered by Brian Spiering on December 31, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP