TransWikia.com

Is there a rule of thumb for a sufficient number of trials for hyperparameter search

Data Science Asked on March 7, 2021

I am implementing a quite complicated Bayesian hyperparameter search in hyperopt library on a CNN.

Is there a rule of thumb for a "sufficient" number of trials? Perhaps based on the number of hyperparameters that constitute the search space?

One Answer

In a pure random search, 60 points is often given as a rule of thumb, because provably with probability 95% such a search finds a hyperparameter combination in the top 5%.

However, that 5% is as a percent of the volume of the search space, so giving much-too-broad a search space, the best 5% might not be a fantastic score for the model. So it does seem to depend, not exactly on the number of hyperparameters, but on the volume of good combinations, which in turn indirectly depends on the number of hyperparameters.

Bayesian optimization should take fewer iterations (but not necessarily less time!) to reach an optimum than a pure random search, and in particular should rule out swaths of bad volume relatively quickly.

So, to conclude...no, I don't actually have a rule of thumb. 60 is probably a decent bet, given the tradeoff of the last two paragraphs; I'd go up to 100 if the training isn't too expensive. Also consider whether your package allows you to continue a Bayesian search: you could analyze the results so far (mostly, see if the last few iterations have clustered around some point, and how widely the objective function varies) to see if you want to proceed. Finally, note that scikit-optimize sets a default of 50, but there doesn't seem to be much behind that.

Answered by Ben Reiniger on March 7, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP