Data Science Asked by yash khanna on October 3, 2021
this is a screenshot of my code. i used abc.best_estimator_ (my GridSearchCV model) to find out best results. As you can see grid has values of C=1 and C=100 along with other values. abc.best_estimator_ says C=1 is the best value. For cross checking i tried using different values of c and here i’m getting a better score for C=100. I was getting similar results while finding gamma also, but later on i commented out gamma so as to focus on C only.
Any idea why is this happening? Am i doing something wrong?
You didn't do any cross validation of data in GridSearchCV as you didn't mention any cv parameter. So what GridSearchCV did is it found the best estimator on the train set which possibly made the model overfitted on training data. You should use cross validation to find the best best estimator.
abc = GridSearchCV(clf,params,cv=5)
Answered by SrJ on October 3, 2021
Reason for the difference is -
cv : int, cross-validation generator or an iterable, default=None
Determines the cross-validation splitting strategy. Possible inputs for cv are:
None, to use the default 5-fold cross validation
It means Training/testing is happening on 80/20 and 5-Fold average.
But when you are testing, you are calculating the score on test data.
Even though if you change it to Train, it will continue to show this behavior. That is explained in the 2nd point.
What is needed -
- Try GridSearchCV on a bigger dataset
- And there should be a value of the grid which gives clearly better score. May happen when the grid has 4-5 parameters
Answered by 10xAI on October 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP