TransWikia.com

xgboost in R have different results compared to boosted decision tree in Azure ML

Data Science Asked on February 3, 2021

I have a small data set (4000 records with 10 features) and I used XGBOOST in R as well as Boosted Decision Tree model in Azure ML studio. Unfortunately the results are different. I like to optimize recall and I could pick that as a measure in Azure but I can not do so in R.

I used the same parameters in both platforms. I know seeds might be different but I tried many of them. I always have a much better recall on my validation dataset using the Azure model compared to the R one.

I wonder if there is a big difference behind the methodology used in these two platforms causing me the issues.

I also used cross validation which did not help. Any insight is appreciated.

Thanks

One Answer

It's hard to say, without being able to know exactly what Azure is doing.

  • From what they do share, they bin continuous features; you could try tree_method='hist' in xgb to be more similar there.
  • I can't tell how Azure deals with categoricals or missing values.
  • Be sure to set xgb's max_depth=0 and grow_policy='lossguide', since you want to use max_leaves instead for a direct comparison.

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-boosted-decision-tree#usage-tips
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-boosted-decision-tree#module-parameters

Answered by Ben Reiniger on February 3, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP