Averaging CNN perform worse than boosting

Data Science Asked by Yuriy Nazarov on March 1, 2021

I’m trying to solve Quora Question Pairs with model stacking.

My first layers are:

CNN trained to predict the same target as whole model should
"Magic features" like question frequency in whole dataset

And the second is gradient boosting.

My first attempt was to train CNN on whole(with 10% to dev) train dataset and than train gradient boosting (GB) on the same split.

As expected it was terrible because GB learned that CNN almost always right.

So I split train data into 10 parts and trained 10 different CNN with 1 part as dev.
Than I trained GB with CNN predictions on dev parts(so it’s almost unseen data). And used average of 10 CNNs for same feature on test dataset.

The result was worse than gradient boosting applied to "magic features" alone.

Can you help me to identify my mistakes?

cnn ensemble modeling xgboost

Add your own answers!

Ask a Question

Get help from others!