Data Science Asked by blacksite on December 23, 2020
I am modeling a continuous regression/forecasting problem for very right-skewed data. I’ve been using ElasticNet and Huber regression with quite a bit of success, and have recently moved into using XGBoost to see if it’ll provide any additional value. The dimensions of my training matrix is 60,000 rows by 500 columns.
What I’ve found is that the much simpler, more interpretable ElasticNet/Huber regression models very often outperform any XGBoost model I’ve built. The only way I can get XGBoost to compete is by using a ton of different forms of regularization. In particular: the most performant XGBoost models have had reg_alpha
/reg_lambda
parameters in the [10-150] range; gamma
in the [25, 100]
range, subsample
of 0.5, colsample_by_tree
of 0.5, and shallow max_depths
, e.g. 3/4/5, with around 150 n_estimators
.
From what I’ve gathered in various tutorials online, gamma
values over 10 or 20 seem to be very high, although I completely acknowledge that statement could be very dependent on the characteristics of the dataset being used.
For this super-regularized model, the predictions and feature importances make sense from an intuitive perspective.
I guess I’m just looking for some input – is it insane that I have such high regularization parameters, or am I more justified than once thought in these high values, since the proof seems to be in the pudding with the model’s predictive power/generalizability and important features?
I support your "proof is in the pudding" sentiment.
Some of those hyperparameters are not that extreme, in my experience. Boosted trees very often perform best with weak individual learners; your max_depth
is right in line with what I'm used to seeing as best. The score regularization penalties (alpha, lambda) don't play as important a role in my experience, but I'm used to seeing optimal parameters chosen in the high double-digits. Your subsampling and column subsetting rates also seem reasonable, if on the lower end of what I've generally seen as being optimal. Your gamma is quite high, but that doesn't mean something is wrong; perhaps if you shrink the max depth a bit you could relax the gamma regularization, but I don't think that's in any way necessary.
One possible explanation for this situation: your data is relatively linear and without interactions, so that xgboost doesn't get its main benefits. And your data is noisy enough that, lacking those nonlinear trends, xgboost ends up fitting to noise readily unless you strongly regularize it.
Answered by Ben Reiniger on December 23, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP