Data Science Asked by Louis Ryan on January 1, 2021
I have some imbalanced (1400 samples of which 250 are +ve) data for a binary classification problem and I am running an SVM grid search optimising for precision. I am trying 3,4,5,6,7,and 8 stratified and shuffled k-folds and in all cases I am finding the precision to be higher in training than validation (by "all" I mean all the cases of the search that return anything worth using – whilst I don’t care too much about recall I can’t have 5 TPs in my results).
I’m typically getting 90% training precision and 60% test precision and the SD for all k-fold runs never exceeds 5% for the test data.
This seems to go against my intuition of what overfitting is. My next steps will be to ensemble 6/7 undersampled data sets, review my feature space (reduce dimensions/try other features combinations) or try a different model altogether.
Can someone explain what might be going on here and other ways I could remedy?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP