Data Science Asked by Yuriy Nazarov on March 1, 2021
I’m trying to solve Quora Question Pairs with model stacking.
My first layers are:
And the second is gradient boosting.
My first attempt was to train CNN on whole(with 10% to dev) train dataset and than train gradient boosting (GB) on the same split.
As expected it was terrible because GB learned that CNN almost always right.
So I split train data into 10 parts and trained 10 different CNN with 1 part as dev.
Than I trained GB with CNN predictions on dev parts(so it’s almost unseen data). And used average of 10 CNNs for same feature on test dataset.
The result was worse than gradient boosting applied to "magic features" alone.
Can you help me to identify my mistakes?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP