i'm using GridSearchCV to find parameter C for SVC() classifier present in sklearn.svm . I'm not getting the optimal result desired

Question

this is a screenshot of my code. i used abc.best_estimator_ (my GridSearchCV model) to find out best results. As you can see grid has values of C=1 and C=100 along with other values. abc.best_estimator_ says C=1 is the best value. For cross checking i tried using different values of c and here i'm getting a better score for C=100. I was getting similar results while finding gamma also, but later on i commented out gamma so as to focus on C only.
Any idea why is this happening? Am i doing something wrong?

SrJ · Answer

You didn't do any cross validation of data in GridSearchCV as you didn't mention any cv parameter. So what GridSearchCV did is it found the best estimator on the train set which possibly made the model overfitted on training data. You should use cross validation to find the best best estimator.
abc = GridSearchCV(clf,params,cv=5)

10xAI · Answer

Reason for the difference is -

GridSearchCV uses Cross-Validation techniques. When you don't provide any value i.e. None, it will use CV = 5.

cv : int, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross validation

It means Training/testing is happening on 80/20 and 5-Fold average.
But when you are testing, you are calculating the score on test data.
Even though if you change it to Train, it will continue to show this behavior. That is explained in the 2nd point.

Iris data set is very small i.e. one record ~ 1%.
and at the same time it is very simple to Classify, so very C value is able to achieve similar results. So, when GridSearchCV tries with 5-Fold on train data it sees a C which is almost 98% accurate. But on actual Test data, the same C misses 1-2 records, and the score dip by 1-2%. If you try multiple times, we will also get a consistent result a few times.

What is needed -
- Try GridSearchCV on a bigger dataset
- And there should be a value of the grid which gives clearly better score. May happen when the grid has 4-5 parameters

i'm using GridSearchCV to find parameter C for SVC() classifier present in sklearn.svm . I'm not getting the optimal result desired

2 Answers

Add your own answers!

Ask a Question