Data Science Asked on April 16, 2021
While trying to evaluate my Ridge Regression model and using GridSearchCV to find the best parameter. I noticed that the best estimator changes every time I change the random_state
in my KFold
object (cv
parameter). With this in mind how do I choose the most optimal hyper parameter to implement my model.
I am assuming, you are taking about the random_state of the Model, not the GridSearch (as it doesn't have a random_state) RandomizedSearchCV do need one.
A random state defines a starting point for an underlying random process.
With an ideal model, the difference should be very small. Though it will be. If it is high, it means there is some Outlier data point/pattern due to which certain value is favoured.
Ignore - if small
If it is large for a specific value
- Check the data
- Increase K for the CrossVal
From another Stackexchange Question
random state values which performed well in the validation set do not correspond to those which would perform well in a new, unseen test set.
From MachineLearning Mastery
We can increase k and build even more models, as long as the data within each fold remains representative of the problem
Links
Stochastic Process
Randomness in ML
Is-random-state-a-parameter-to-tune
Answered by 10xAI on April 16, 2021
If the scoring is very dependent on random_state
, it would be better to try to address that rather than choosing a hyperparameter from what you have. You mentioned you used KFold
, and that your data is quite small; I suggest trying RepeatedKFold
instead.
Answered by Ben Reiniger on April 16, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP