Data Science Asked by Tasos on August 31, 2021
In the case of a classification problem where a cost matrix is used to maximize the model performance, it is common to do a rebalance technique.
Let’s say for example that I have the following costs for the two classes.
C(a,a) = 0, C(b,b) = 0, C(a,b) = 2, C(b,a) = 1.
Then, with a Rebalancing technique, I would need examples of class b twice as the examples of class a.
But, what should by rebalancing strategy will be when there is a cost for (a,a) or (b,b)?
For instance,
C(a,a) = 0, C(b,b) = 2, C(a,b) = -2, C(b,a) = -10
How should I handle those cases?
Is not very common to find cost functions where there is a cost associated with a correct answer C(b,b) (in your example).
But supposing there is, I think the solution to the classification could be trivial: I could say "All my predictions are 'b'" and in that way, I could have -10 as a cost many times and thus giving me a negative cost (depending on balance, of course).
I did not know the technique for applying cost you mention (rebalance accordingly), but for me, it would be more natural if the objective function changes to have this into account.
The following article talks about a possibility for addressing this (Instead of rebalancing, we should measure the cost sensitive matrix). And with XGBoost!
As far as I know, the cost function of a XGBoost classification can be personalized.
Answered by Juan Esteban de la Calle on August 31, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP