Data Science Asked by Kush on May 29, 2021
I’ve created a couple of models during some assignments and hackathons using algorithms such as Random Forest and XGBoost and used GridSearchCV to find the best combination of parameters. But what I’m not able to understand is how to select those parameters for GridSearchCV. I randomly put the parameters such as
params = {"max_depth" : [5, 7, 10, 15, 20, 25, 30, 40, 50,100],
"min_samples_leaf" : [5, 10, 15, 20, 40, 50, 100, 200, 500, 1000,10000],
"criterion": ["gini","entropy"],
"n_estimators" : [10, 15, 20, 40, 50, 75, 100,1000],
"max_features" : ["auto", "sqrt","log2"]}
But how do I decide if I could select better parameters which might be computationally better as well? I can’t use the same above parameters for a Random Forest Classifier every single time surely?
That is indeed a drawback with grid search strategy, since you must know in advance each one of the possible combinations to try out, and that might be not optimal neither to get the best evaluation metric value nor in computation performance.
You have other interesting strategies, not exhaustive hyperparameter search, for instance random search or based on bayesian tuning, for a more efficient search and being a "more clever" search strategy in the second option.
You can have a look at HyperOpt library with several optimization algorightms (see also this link for a practical use case), and more recently Keras released a nice keras tuner (which I love by the way).
You can also have a look at this answer for a worked out example on a XGB model using Hyperopt, and this one for using keras tuner. You can also check the keras tuner wrapper for sklearn models: https://keras-team.github.io/keras-tuner/documentation/tuners/#sklearn-class
Correct answer by German C M on May 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP