TransWikia.com

How to use "tree boosting" with a data-driven loss function

Data Science Asked on June 8, 2021

We have a problem which has a data-driven (non-analytical) loss function. Our target contains whole numbers between 0 and 20 (the target is inherently discrete), although larger values are possible, just not present in our dataset. The fact that we have a very precise loss function leaves us with some serious issues when using algorithms like XGBoost:

The loss function is generally non-convex. It’s not easily fitted by a convex function since its shape is data-driven and can vary drastically. For example, this means that a large punishment is inevitably given for predictions further from the part of the function that is well-fitted, where no large punishment is required. If we interpolate instead of fit, the hessian can be negative (see attached picture), which is a problem for determining leaf weights (right?).

From top to bottom: example interpolation of one of the better behaved loss functions, with its gradient, and its hessian.

We think we can adapt something like the XGBoost algorithm (I use this algorithm as an example because I’m both familiar with the paper and the API) by swapping out its dependance on the gradient en hessian with a brute-force method for finding the optimal leaf weights and best gain. However, this will slow down the algorithm massively, perhaps cripplingly so.

My questions are: is the some default way of dealing with complex loss-functions within existing algorithms? Is the an algorithm that is suited for dealing with these problems? Is there anything else you could suggest to solve the above issues?

Thanks in advance.

2 Answers

First some prior and known aclarations (that you probably already know).

Metric is what we want to optimize.

Optimization Loss is what the model optimizes.

Obviously, we would like the Metric and the optimization loss to be the same, but this is always not possible. How to deal with this?

  • Run the right model. Some models can optimize different loss functions. In the case of XGBoost you have two loss functions, the one of the decision tree and the one of the boosting.

  • Preprocess the target and optimize another metric, this will be for example transforming the target to the logarithmic of the target and then in that space applying a known loss function

  • Optimize another loss function and metric and then post-process the predictions.

  • Write your own cost functions. For xgboost we implement a single function that takes predictions and target values and computes the first and second-order derivatives.

  • Optimize another metric and use early stopping.

The last one almost always works.

In general for complex algorithms Neural Networks tend to work better due to the flexibility of the loss functions (more than in normal ML).

Answered by Carlos Mougan on June 8, 2021

With XGBoost you can come up with your own loss and metric. It is relatively simple to just add a custom loss. However, I have no experiance with problems described by you, so you would need to see if what you have in mind will fit into the standard XGB.

Find an implementation of custom loss (R) here: https://github.com/Bixi81/R-ml/blob/master/xgboost_custom_objective_fair_loss.R

Answered by Peter on June 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP