SVM overfitting with consistent validation results

Data Science Asked by Louis Ryan on January 1, 2021

I have some imbalanced (1400 samples of which 250 are +ve) data for a binary classification problem and I am running an SVM grid search optimising for precision. I am trying 3,4,5,6,7,and 8 stratified and shuffled k-folds and in all cases I am finding the precision to be higher in training than validation (by "all" I mean all the cases of the search that return anything worth using – whilst I don’t care too much about recall I can’t have 5 TPs in my results).

I’m typically getting 90% training precision and 60% test precision and the SD for all k-fold runs never exceeds 5% for the test data.

This seems to go against my intuition of what overfitting is. My next steps will be to ensemble 6/7 undersampled data sets, review my feature space (reduce dimensions/try other features combinations) or try a different model altogether.

Can someone explain what might be going on here and other ways I could remedy?

cross validation overfitting svm

Add your own answers!

Ask a Question

Get help from others!