Random seed in Machine learning model comparison

Question

I would like to ask a question about the random population generation gin splitting the dataset in machine learning classification models.
For example, I used seed = 1 and got accuracy of 0.7 and seed = 5 and got accuracy of 0.8 and seed= 2000 and got accuracy of 0.89 using Adaboost.
I found a research paper using the same dataset i used and accuracy achieved is 0.94 using xgboost model without specifying the seed used in developing the model.
Same exists for other research papers as well.
Which results I need to pick to compare my model with other models proposed in the literature? Meanwhile I implemented all the models proposed in literature with the different seed and found that results are different when compared to the paper. And sometimes without using my result with adaboost is better.
I need help to compare my proposal with other works.

dcolazin · Answer

I agree that sometimes the authors are not that clear about some details. If a dataset is already splitted into train/test/validation, then there is not much to do (supposing the dataset is well made). Given a dataset not already splitted then you have (at least) two ways to test your model:

fix a random seed, check that the split is not biased and run several trainings. Even with the same split, the same model can converge to different local minima, because of the stochastic descent;
don't fix a random seed and run several trainings (but store the seed where you keep your experiment data!).

If your findings are not coherent with the literature, and you are sure there aren't bugs in the code, then you should ask specific questions or write to the authors. Depending on the model, you might try some standard validation techniques.

BeamsAdept · Answer

The best way to avoid your issue is using a K-Fold Cross Validation. Having a 0.1 difference in accuracy between 2 seeds is a lot, so I'd suggest you doing cross validation, with a random splits (you can, for example, shuffle your data before entering cross-validation loop).
Running 2 times your Cross-Validation Loop, shuffling data before, could give you a bit of a change (because of the shuffle, cross validation sets won't be exactly the same), but I highly doubt it'll be as high as 0.1
Then you could compare your model, since you're using a stable evaluation process.

Random seed in Machine learning model comparison

2 Answers

Add your own answers!

Ask a Question