CMA-ES

Disclaimer: We make use of the implementation available at PyPi [14] published by the author Nikolaus Hansen under the BSD license.

CMA-ES which was proposed in [15]. Moreover, a comparing review can be found in [16]. CMA-ES stands for covariance matrix adaptation evolution strategy. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value \(f(x)\). Like this, over the generation sequence, individuals with better and better \(f\)-values are generated. (excerpt from Wikipedia).

Example

[1]:
import numpy as np

from pymoo.algorithms.soo.nonconvex.cmaes import CMAES
from pymoo.problems import get_problem
from pymoo.optimize import minimize

problem = get_problem("sphere")

algorithm = CMAES(x0=np.random.random(problem.n_var))

res = minimize(problem,
               algorithm,
               seed=1,
               verbose=False)

print(f"Best solution found: \nX = {res.X}\nF = {res.F}\nCV= {res.CV}")

Best solution found:
X = [0.50000006 0.50000001 0.49999999 0.50000004 0.50000003 0.49999995
 0.49999996 0.50000002 0.50000012 0.5       ]
F = [2.49878285e-14]
CV= [0.]

CMA-ES already has several stopping criteria implemented. However, as for other algorithms, the number of iterations or function evaluations can be directly passed to minimize.

[2]:
res = minimize(problem,
               algorithm,
               ('n_iter', 10),
               seed=1,
               verbose=True)

print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
==================================================================================================================
n_gen  |  n_eval  |     f_avg     |     f_min     |     f_gap     |     sigma     | min_std  | max_std  |   axis
==================================================================================================================
     1 |        1 |  0.9951018326 |  0.9951018326 |             - |  0.1000000000 |  0.10000 |  0.10000 |  1.00005
     2 |       11 |  1.0588392396 |  0.7871077921 |             - |  0.0937936655 |  0.09138 |  0.09561 |  1.15632
     3 |       21 |  1.0033803597 |  0.7871077921 |             - |  0.0933657298 |  0.08985 |  0.09636 |  1.25238
     4 |       31 |  0.9040063046 |  0.7358691257 |             - |  0.0964522184 |  0.09308 |  0.10083 |  1.34803
     5 |       41 |  0.8073733398 |  0.6234936058 |             - |  0.1044169833 |  0.10071 |  0.10898 |  1.44183
     6 |       51 |  0.7380904029 |  0.5632457080 |             - |  0.1058474805 |  0.09945 |  0.11113 |  1.48252
     7 |       61 |  0.6257350470 |  0.4247038546 |             - |  0.1137073223 |  0.10617 |  0.12207 |  1.52703
     8 |       71 |  0.5038887135 |  0.2966207619 |             - |  0.1213519552 |  0.11272 |  0.13291 |  1.59949
     9 |       81 |  0.4555728675 |  0.2196163026 |             - |  0.1401068467 |  0.13041 |  0.15575 |  1.63050
    10 |       91 |  0.3369518400 |  0.1714580921 |             - |  0.1551567474 |  0.14264 |  0.16795 |  1.65941
Best solution found:
X = [0.56539654 0.42014897 0.35346779 0.44738495 0.74814995 0.48972778
 0.60831944 0.3837163  0.69141232 0.61396304]
F = [0.17145809]
[3]:
res = minimize(problem,
               algorithm,
               ('n_eval', 50),
               seed=1,
               verbose=True)

print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
==================================================================================================================
n_gen  |  n_eval  |     f_avg     |     f_min     |     f_gap     |     sigma     | min_std  | max_std  |   axis
==================================================================================================================
     1 |        1 |  0.9951018326 |  0.9951018326 |             - |  0.1000000000 |  0.10000 |  0.10000 |  1.00005
     2 |       11 |  1.0588392396 |  0.7871077921 |             - |  0.0937936655 |  0.09138 |  0.09561 |  1.15632
     3 |       21 |  1.0033803597 |  0.7871077921 |             - |  0.0933657298 |  0.08985 |  0.09636 |  1.25238
     4 |       31 |  0.9040063046 |  0.7358691257 |             - |  0.0964522184 |  0.09308 |  0.10083 |  1.34803
     5 |       41 |  0.8073733398 |  0.6234936058 |             - |  0.1044169833 |  0.10071 |  0.10898 |  1.44183
     6 |       51 |  0.7380904029 |  0.5632457080 |             - |  0.1058474805 |  0.09945 |  0.11113 |  1.48252
Best solution found:
X = [0.72637368 0.24632324 0.06630157 0.45218637 0.81113349 0.58494892
 0.3381975  0.7167156  0.54304067 0.77973154]
F = [0.56324571]

Also, easily restarts can be used, which are known to work very well on multi-modal functions. For instance, Rastrigin can be solved rather quickly by:

[4]:
problem = get_problem("rastrigin")

algorithm = CMAES(restarts=10, restart_from_best=True)

res = minimize(problem,
               algorithm,
               ('n_evals', 2500),
               seed=1,
               verbose=False)

print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
Best solution found:
X = [-1.22563293e-10  2.16619522e-09]
F = [0.]

Our framework internally calls the cma.fmin2 function. All parameters that can be used there either as a keyword argument or an option can also be passed to the CMAES constructor. An example with a few selected cma.fmin2 parameters is shown below:

[5]:
import numpy as np
from pymoo.util.normalization import denormalize

# define an intitial point for the search
np.random.seed(1)
x0 = denormalize(np.random.random(problem.n_var), problem.xl, problem.xu)

algorithm = CMAES(x0=x0,
                 sigma=0.5,
                 restarts=2,
                 maxfevals=np.inf,
                 tolfun=1e-6,
                 tolx=1e-6,
                 restart_from_best=True,
                 bipop=True)

res = minimize(problem,
               algorithm,
               seed=1,
               verbose=False)

print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
Best solution found:
X = [2.73861916e-07 3.30861459e-07]
F = [3.65965036e-11]

For more details about hyperparameters, we refer to the software documentation of the fmin2 in CMA-ES, which can be found here. A quick explanation of possible parameters is also provided in the API documentation below.

API

class pymoo.algorithms.soo.nonconvex.cmaes.CMAES(self, x0=None, sigma=0.1, normalize=True, parallelize=True, maxfevals=np.inf, tolfun=1e-11, tolx=1e-11, restarts=0, restart_from_best='False', incpopsize=2, eval_initial_x=False, noise_handler=None, noise_change_sigma_exponent=1, noise_kappa_exponent=0, bipop=False, cmaes_verbose=- 9, verb_log=0, output=CMAESOutput(), pop_size=None, **kwargs)
Parameters
x0list or numpy.ndarray

initial guess of minimum solution before the application of the geno-phenotype transformation according to the transformation option. It can also be a string holding a Python expression that is evaluated to yield the initial guess - this is important in case restarts are performed so that they start from different places. Otherwise x0 can also be a cma.CMAEvolutionStrategy object instance, in that case sigma0 can be None.

sigmafloat

Initial standard deviation in each coordinate. sigma0 should be about 1/4th of the search domain width (where the optimum is to be expected). The variables in objective_function should be scaled such that they presumably have similar sensitivity. See also ScaleCoordinates.

parallelizebool

Whether the objective function should be called for each single evaluation or batch wise.

restartsint, default 0

Number of restarts with increasing population size, see also parameter incpopsize, implementing the IPOP-CMA-ES restart strategy, see also parameter bipop; to restart from different points (recommended), pass x0 as a string.

restart_from_bestbool, default false

Which point to restart from

incpopsizeint

Multiplier for increasing the population size popsize before each restart

eval_initial_xbool

Evaluate initial solution, for None only with elitist option

noise_handlerclass

A NoiseHandler class or instance or None. Example: cma.fmin(f, 6 * [1], 1, noise_handler=cma.NoiseHandler(6)) see help(cma.NoiseHandler).

noise_change_sigma_exponentint

Exponent for the sigma increment provided by the noise handler for additional noise treatment. 0 means no sigma change.

noise_kappa_exponentint

Instead of applying reevaluations, the “number of evaluations” is (ab)used as init_simplex_scale factor kappa (experimental).

bipopbool

If True, run as BIPOP-CMA-ES; BIPOP is a special restart strategy switching between two population sizings - small (like the default CMA, but with more focused search) and large (progressively increased as in IPOP). This makes the algorithm perform well both on functions with many regularly or irregularly arranged local optima (the latter by frequently restarting with small populations). For the bipop parameter to actually take effect, also select non-zero number of (IPOP) restarts; the recommended setting is restarts<=9 and x0 passed as a string using numpy.rand to generate initial solutions. Note that small-population restarts do not count into the total restart count.

AdaptSigmaTrue

Or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA

CMA_activeTrue

Negative update, conducted after the original update

CMA_activefac1

Learning rate multiplier for active update

CMA_cmean1

Learning rate for the mean value

CMA_const_traceFalse

Normalize trace, 1, True, “arithm”, “geom”, “aeig”, “geig” are valid

CMA_diagonal0*100*N/popsize**0.5

Number of iterations with diagonal covariance matrix, True for always

CMA_eigenmethodnp.linalg.eigh or cma.utilities.math.eig or pygsl.eigen.eigenvectors
CMA_elitistFalse or “initial” or True

Elitism likely impairs global search performance

CMA_injections_threshold_keep_len0

Keep length if Mahalanobis length is below the given relative threshold

CMA_mirrorspopsize < 6

Values <0.5 are interpreted as fraction, values >1 as numbers (rounded), otherwise about 0.16 is used

CMA_mirrormethodint, default 2, 0=unconditional, 1=selective, 2=selective with delay
CMA_muNone

Parents selection parameter, default is popsize // 2

CMA_on1

Multiplier for all covariance matrix updates

CMA_samplerNone
A class or instance that implements the interface of

cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass

CMA_sampler_optionsdict

Options passed to CMA_sampler class init as keyword arguments

CMA_rankmu1.0

Multiplier for rank-mu update learning rate of covariance matrix

CMA_rankone1.0

Multiplier for rank-one update learning rate of covariance matrix

CMA_recombination_weightsNone

A list, see class RecombinationWeights, overwrites CMA_mu and popsize options

CMA_dampsvec_facnp.Inf

Tentative and subject to changes, 0.5 would be a “default” damping for sigma vector update

CMA_dampsvec_fade0.1

Tentative fading out parameter for sigma vector update

CMA_teststdsNone

Factors for non-isotropic initial distr. of C, mainly for test purpose, see CMA_stds for production

CMA_stdsNone

Multipliers for sigma0 in each coordinate, not represented in C, makes scaling_of_variables obsolete

CSA_dampfac1

Positive multiplier for step-size damping, 0.3 is close to optimal on the sphere

CSA_damp_mueff_exponent0.5

Zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option

CSA_disregard_lengthFalse

True is untested, also changes respective parameters

CSA_clip_length_valueNone

Poorly tested, [0, 0] means const length N**0.5, [-1, 1] allows a variation of +- N/(N+2), etc.

CSA_squaredFalse

Use squared length for sigma-adaptation ‘,

BoundaryHandlerBoundTransform or BoundPenalty, unused when bounds in (None, [None, None])
conditioncov_alleviate[1e8, 1e12]

When to alleviate the condition in the coordinates and in main axes

eval_final_meanTrue

Evaluate the final mean, which is a favorite return candidate

fixed_variablesNone

Dictionary with index-value pairs like dict(0=1.1, 2=0.1) that are not optimized

ftarget-inf

Target function value, minimization

integer_variables[]

Index list, invokes basic integer handling: prevent std dev to become too small in the given variables

maxfevalsinf

Maximum number of function evaluations

maxiter100 + 150 * (N+3)**2 // popsize**0.5

Maximum number of iterations

mean_shift_line_samplesFalse

Sample two new solutions colinear to previous mean shift

mindx0

Minimal std in any arbitrary direction, cave interference with tol

minstd0

Minimal std (scalar or vector) in any coordinate direction, cave interference with tol

maxstdinf

Maximal std in any coordinate direction

pc_line_samplesFalse

One line sample along the evolution path pc

popsize4+int(3*np.log(N))

Population size, AKA lambda, number of new solution per iteration

randnnp.random.randn

Randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss

signals_filenameNone

cma_signals.in # read versatile options from this file which contains a single options dict, e.g. dict("timeout"=0) to stop, string-values are evaluated, e.g. “np.inf” is valid

termination_callbackNone

A function returning True for termination, called in stop with self as argument, could be abused for side effects

timeoutinf

Stop if timeout seconds are exceeded, the string “2.5 * 60**2” evaluates to 2 hours and 30 minutes

tolconditioncov1e14

Stop if the condition of the covariance matrix is above tolconditioncov

tolfacupx1e3

Termination when step-size increases by tolfacupx (diverges). That is, the initial step-size was chosen far too small and better solutions were found far away from the initial solution x0

tolupsigma1e20

Sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates “creeping behavior” with usually minor improvements

tolfun1e-11

Termination criterion: tolerance in function value, quite useful

tolfunhist1e-12

Termination criterion: tolerance in function value history

tolstagnationint(100 + 100 * N**1.5 / popsize)

Termination if no improvement over tolstagnation iterations

tolx1e-11

Termination criterion: tolerance in x-changes

typical_xNone

Used with scaling_of_variables’,

updatecovwaitNone

Number of iterations without distribution update, name is subject to future changes

cmaes_verbose3

Verbosity e.g. of initial/final message, -1 is very quiet, -9 maximally quiet, may not be fully implemented

verb_append0

Initial evaluation counter, if append, do not overwrite output files

verb_disp100

Verbosity: display console output every verb_disp iteration

verb_filenameprefixstr

CMADataLogger.default_prefix + Output path and filenames prefix

verb_log1

Verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions

verb_plot0

In fmin(): plot() is called every verb_plot iteration

verb_timeTrue

Output timings on console

vvdict

Versatile set or dictionary for hacking purposes, value found in self.opts[“vv”]

kwargsdict

A dictionary with additional options passed to the constructor of class CMAEvolutionStrategy, see cma.CMAOptions () for a list of available options.