xgboost in R have different results compared to boosted decision tree in Azure ML

Question

I have a small data set (4000 records with 10 features) and I used XGBOOST in R as well as Boosted Decision Tree model in Azure ML studio. Unfortunately the results are different. I like to optimize recall and I could pick that as a measure in Azure but I can not do so in R.

I used the same parameters in both platforms. I know seeds might be different but I tried many of them. I always have a much better recall on my validation dataset using the Azure model compared to the R one.

I wonder if there is a big difference behind the methodology used in these two platforms causing me the issues.

I also used cross validation which did not help. Any insight is appreciated.

Thanks

Ben Reiniger · Answer

It's hard to say, without being able to know exactly what Azure is doing.

From what they do share, they bin continuous features; you could try tree_method='hist' in xgb to be more similar there. 
I can't tell how Azure deals with categoricals or missing values. 
Be sure to set xgb's max_depth=0 and grow_policy='lossguide', since you want to use max_leaves instead for a direct comparison.

https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-boosted-decision-tree#usage-tips
https://docs.microsoft.com/en-us/azure/machine-learning/studio-module-reference/two-class-boosted-decision-tree#module-parameters

xgboost in R have different results compared to boosted decision tree in Azure ML

One Answer

Add your own answers!

Ask a Question