TransWikia.com

GBM: small change in the trainset causes radical change in predictions

Data Science Asked by Charles_de_Montigny on June 30, 2021

I have build a model using transactions data trying to predict the value of future transactions. The main algorithm is Gradient Boosting Machine. The overall accuracy on the testset is fine and there is no sign of overfitting. However, a small change in the training set creates radical change in the model, and in the predictions. But even when the testset change a little the overall accuracy is stable.

The time period is from 2005 to today and when a single day is added to the dataset predictions change drastically (e.g. +/- 10%). If multiple training are perform on the same training set, the predictions are the same.

I have test Light GBM(2.1.0) and XGBoost(0.60) with Python 3.6 on Windows 10. A seed is set and I train the model on CPUs. I have tried to increase the number of iterations to a high number and adding a specific seed to the bagging parameters.

This blogpost discuss brefly that issu without giving any solutions.

One Answer

A good way to avoid this problem is to add noise to your training dataset this will made your model more robust and less versatile.

there is a different way of adding noise, often a gaussian noise is added it depends to the kind of data you have.

Answered by vico on June 30, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP