Data Science Asked by etiennedm on January 16, 2021
In the case I want to predict only ranges from a continuous value, is there any reason to use regression instead of classification ? Could it depend on the type of model I am using (neural network, decision tree, bayesian, …) ?
Example
Let say I have a dataset with images. Each image has one human on it and is labeled with his/her height. Now I am only interested in predicting height ranges, for instance these four classes [ A, B, C, D ] = [ <150, 150-170, 170-190, >190 ] (in cm). Is there any reason why one of the two following approaches should lead to better performances ?
Note: I am wondering if there is a general answer to this question, not only to this example
EDIT
As @n1tk pointed out, in the post Performance of CNN based deep models with number of classes, the question is answered if we think about increasing the number of classes. In my question, I am wondering about regression vs classification. So try to fit a continuous value vs ranges from this value.
The general answer is how the model will be used deal. Either way may be optimal for the case.
For example - If the model groups applicants into good credit risk and bad credit risk, that might be fine to say model score > x = good risk and model score <= x = bad risk. But maybe there will be differential action based on the model score - like giving a different interest rate or a bigger loan.
In the original example, in regression actual = 191, predicted = 189 you can calculate the loss.
In classification, if actual = 191 and P(>190)=0.35, P(170-190)=0.40, P(150-170)=0.25, then you just know the wrong class. Is that enough for the model usage?
There is also the assumption that a "closer" class will be chosen but that might not be true, e.g actual=191, P(>190)=0.25, P(170-190)=0.25, P(150-170)=0.5. The regression could come up with 160 also but you can measure that loss if the model usage requires it. Many classification algorithms do not know if classes are "close" - Confusion matrix. "How close I am to the diagonal?". Is there such metric?
You can also look at Ordinal Regression https://en.wikipedia.org/wiki/Ordinal_regression. In this case there is an implicit ranking in the "class".
Choose based on how the model will be used. Always important to know the usage and the problem being solved, then work backwards to the model.
Hope that helps.
Answered by Craig on January 16, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP