Gridsearch XGBoost for ensemble. Do I include first-level prediction matrix of base learners in train set?

Question

I'm not quite sure how I should go about tuning xgboost before I use it as a meta-learner in ensemble learning.

Should I include the prediction matrix (ie. df containing columns of prediction results from the various base learners) or should I just include the original features?

I have tried both methods with just the 'n_estimators' tuned with F1 score as the metric for cross-validation. (learning rate =0.1)

Method 1: With pred matrix + original features:

n_estimators = 1 (this means only one tree is included in the model, is this abnormal? )
F1 Score (Train): 0.907975 (suggest overfitting)

Method 2: With original features only:

n_estimators = 1
F1 Score (Train): 0.39

I am getting rather different results for both methods, which makes sense as the feature importance plot for Method 1 shows that one of the first-level predictions is the most important.

I think that the first-level predictions by the base-learners should be included in the gridsearch. Any thoughts?

Ben Reiniger · Answer

You should tune the meta-estimator using whatever data you want it to eventually predict with.  This should definitely include the base model predictions (else you aren't actually ensembling), and may or may not include (some of) the original features.

One important note though: you should not be training the meta-estimator using "predictions" of the base models on their own training data; those are more accurately called estimations rather than predictions, because the base models already had access to the truth.  A common approach is to train the meta-estimator on out-of-fold predictions from a cross-validation training of the base models.

If the base models are quite good, then it's reasonable that the xgboost model might only use one tree; it just has to tweak the already-good predictions from the base models.  But, consider dropping the learning rate or otherwise increasing regularization, to see if more trees can perform better.

Gridsearch XGBoost for ensemble. Do I include first-level prediction matrix of base learners in train set?

One Answer

Add your own answers!

Ask a Question