TransWikia.com

When is Non-Stochastic Global Optimization Preferable or Necessary?

Data Science Asked by Benji Albert on February 17, 2021

Background

I’m specifically referring to non-convex black-box optimization problems of the form:

$ text{min} f(vec{x})$

$s.t. a_ile x_i le b_i forall iin {1,2,…,n} text{and} vec{a},vec{b}in Bbb{R}^n $

Assume any $n$

Of interest are global optimizers that yield good solutions, but the solutions are not necessarily nor provably the global optimum.

Evolutionary Algorithms

Evolutionary algorithms (EAs) are often the go-to optimizers for these types of problems; such methods include: genetic algorithms, particle swarm optimizers, differential evolution, and all the algorithms based on organismal interactions. Pretty much every EA has stochastic components. Whether it’s random initialization, or stochasticism in inter-generational (or intra-generational) subroutines such as selection for crossover or random mutations, stochasticism is pretty ubiquitous in the realm of EAs. Almost everything you’d find in this journal or this one would fall under this category.

Non-Stochastic Optimization vs. Deterministic Global Optimization

I am not interested in deterministic global optimizers. Such methods provide some form of probability/confidence or even guarantee that the found solution is indeed the global optimum. These are more commonly seen in discrete/combinatorial optimization, but occassionally some deterministic optimizers become tangentially related when the user has some form of a priori knowledge/assumptions. The preference/necessity of deterministic optimizers is clear, even when they just give associated confidence with the solutions they find. So again, I am not referring to these.

Non-Stochastic Global Optimizers

I only know of a few non-stochastic global optimizers. Probably the most famous are the many variants of direct search (also called pattern search) algorithms. Conceived by Fermi and Metropolis, then popularized by Hooke and Jeeves, and extended to a generalized pattern search (GPS) making heavy use of positive bases as meshes, the direct search algorithms are similar to the classic Nelder-Mead method in that they use a neighborhood of points with an underlying geometric structure to (deterministically) explore the search space. Of course some non-stochastic variants exist as well, including Luus-Jaakola‘s sampling of a uniformly distributed neighborhood or the more popular mesh adaptive direct search (MADS) and all of its spin-offs.

There are some other non-stochastic global optimizers hiding out there on the internet, like this one, but I have yet to find one that explains the practical significance of the non-stochasticism.


Question

What are some concrete use-cases for a non-stochastic global optimizer as described in the above-mentioned background?

Are there situations where non-stochastic optimization is necessary? Possibly mission-critical optimization, or where you need repeatability? Maybe something medically-oriented? Or for interpretability?

The only example I can think of (coming from an ML/DL background) is a situation where it would be slightly preferred, but certainly not necessary. In particular, we could train an ML model using a non-stochastic optimization algorithm, which would let us observe the effects of the ML model hyperparameters. In other words, eliminating the stochasticism in the optimizer could help interpret/tune the actual ML model hyperparameters as you’d be able to see causes/effects of modifications, where currently there’s uncertainty due to the randomness involved in training.

One Answer

Non-stochastic global optimization could be practical for the directed search of the solution space. If there is a priori knowledge, that knowledge can direct the optimization to specific regions. Stochastic global optimization might explore inefficiently explore regions of the solution space.

A specific example of non-stochastic global optimization is Bayesian optimization. In Bayesian optimization, samples are not picked at random. Samples are picked based the calculated posterior distribution, often to maximize the expected improvement.

Bayesian optimization has been used in many applications, including machine learning hyperparameter tuning.

Answered by Brian Spiering on February 17, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP