CMAES¶
Disclaimer: We make use of the implementation available at PyPi [1] published by the author Nikolaus Hansen under the BSD license.
CMAES which was proposed in [2]. Moreover, a comparing review can be found in [3]. CMAES stands for covariance matrix adaptation evolution strategy. Evolution strategies (ES) are stochastic, derivativefree methods for numerical optimization of nonlinear or nonconvex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value \(f(x)\). Like this, over the generation sequence, individuals with better and better \(f\)values are generated. (excerpt from Wikipedia).
Example¶
[1]:
from pymoo.algorithms.so_cmaes import CMAES
from pymoo.factory import get_problem
from pymoo.optimize import minimize
problem = get_problem("sphere")
algorithm = CMAES()
res = minimize(problem,
algorithm,
seed=1,
verbose=False)
print(f"Best solution found: \nX = {res.X}\nF = {res.X}\nCV= {res.CV}")
Best solution found:
X = [0.49999997 0.49999994 0.5 0.49999996 0.5 0.5
0.49999996 0.5 0.50000003 0.50000004]
F = [0.49999997 0.49999994 0.5 0.49999996 0.5 0.5
0.49999996 0.5 0.50000003 0.50000004]
CV= [0.]
CMAES already has several stopping criteria implemented. However, as for other algorithms, the number of iterations or function evaluations can be directly passed to minimize
.
[2]:
res = minimize(problem,
algorithm,
('n_iter', 10),
seed=1,
verbose=True)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
==============================================================================
n_gen  n_eval  fopt  sigma  min std  max std  axis
==============================================================================
1  60  0.413583651  0.500000000  0.50000  0.50002  1.00005
2  70  0.413583651  0.443532600  0.42922  0.44527  1.10591
3  80  0.413583651  0.406582613  0.37335  0.41067  1.17817
4  90  0.413583651  0.384570210  0.34673  0.39056  1.20631
5  100  0.413583651  0.374835420  0.33287  0.41146  1.32408
6  110  0.413583651  0.365537593  0.31713  0.40771  1.39290
7  120  0.413583651  0.394854291  0.34417  0.42415  1.46804
8  130  0.413583651  0.416364366  0.35289  0.44742  1.55281
9  140  0.413583651  0.389834619  0.32470  0.42030  1.65435
10  150  0.413583651  0.373800540  0.30839  0.39783  1.65274
Best solution found:
X = [0.36998631 0.38142292 0.55648563 0.73116207 0.84201565 0.31814345
0.41297808 0.4460369 0.7083321 0.84937005]
F = [0.41358365]
[3]:
res = minimize(problem,
algorithm,
('n_evals', 50),
seed=1,
verbose=True)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
==============================================================================
n_gen  n_eval  fopt  sigma  min std  max std  axis
==============================================================================
1  60  0.230829022  0.500000000  0.50000  0.50002  1.00005
Best solution found:
X = [0.39410806 0.31538908 0.76782696 0.38441356 0.42885057 0.43778444
0.60280162 0.65186544 0.35727118 0.30633172]
F = [0.23082902]
Also, easily restarts can be used which are known to work very well on multimodal functions. For instance, Rastrigin
can be solved rather quickly by:
[4]:
problem = get_problem("rastrigin")
algorithm = CMAES(restarts=10, restart_from_best=True)
res = minimize(problem,
algorithm,
('n_evals', 2500),
seed=1,
verbose=False)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
Best solution found:
X = [ 4.50346187e05 7.31912594e05]
F = [1.46514089e06]
Our framework internally calls the cma.fmin2
function. All parameters which can be used there either as a keyword argument or an option can also be passed to the CMAES
constructor as well. An example with a few selected cma.fmin2
parameters is shown below:
[5]:
import numpy as np
from pymoo.util.normalization import denormalize
# define an intitial point for the search
np.random.seed(1)
x0 = denormalize(np.random.random(problem.n_var), problem.xl, problem.xu)
algorithm = CMAES(x0=x0,
sigma=0.5,
restarts=2,
maxfevals=np.inf,
tolfun=1e6,
tolx=1e6,
restart_from_best=True,
bipop=True)
res = minimize(problem,
algorithm,
seed=1,
verbose=False)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
Best solution found:
X = [1.66760792e07 7.76127578e08]
F = [6.71107614e12]
For more details about hyperparameters we refer to the software documentation of the fmin2
in CMAES which can be found here. A quick explanation of possible parameters is also provided in the API documentation below.
API¶

class
pymoo.algorithms.so_cmaes.
CMAES
(self, x0=None, sigma=0.5, parallelize=True, maxfevals=np.inf, tolfun=1e11, tolx=1e11, restarts=0, restart_from_best=’False’, incpopsize=2, eval_initial_x=False, noise_handler=None, noise_change_sigma_exponent=1, noise_kappa_exponent=0, bipop=False, cmaes_verbose=9, verb_log=0, display=CMAESDisplay(), **kwargs)  Parameters
 x0list or numpy.ndarray
initial guess of minimum solution before the application of the genophenotype transformation according to the
transformation
option. It can also be a string holding a Python expression that is evaluated to yield the initial guess  this is important in case restarts are performed so that they start from different places. Otherwisex0
can also be a cma.CMAEvolutionStrategy object instance, in that casesigma0
can beNone
. sigmafloat
Initial standard deviation in each coordinate.
sigma0
should be about 1/4th of the search domain width (where the optimum is to be expected). The variables inobjective_function
should be scaled such that they presumably have similar sensitivity. See also ScaleCoordinates. parallelizebool
Whether the objective function should be called for each single evaluation or batch wise.
 restartsint, default 0
Number of restarts with increasing population size, see also parameter
incpopsize
, implementing the IPOPCMAES restart strategy, see also parameterbipop
; to restart from different points (recommended), passx0
as a string. restart_from_bestbool, default false
Which point to restart from
 incpopsizeint
Multiplier for increasing the population size
popsize
before each restart eval_initial_xbool
Evaluate initial solution, for
None
only with elitist option noise_handlerclass
A
NoiseHandler
class or instance orNone
. Example:cma.fmin(f, 6 * [1], 1, noise_handler=cma.NoiseHandler(6))
seehelp(cma.NoiseHandler)
. noise_change_sigma_exponentint
Exponent for the sigma increment provided by the noise handler for additional noise treatment. 0 means no sigma change.
 noise_kappa_exponentint
Instead of applying reevaluations, the “number of evaluations” is (ab)used as scaling factor kappa (experimental).
 bipopbool
If True, run as BIPOPCMAES; BIPOP is a special restart strategy switching between two population sizings  small (like the default CMA, but with more focused search) and large (progressively increased as in IPOP). This makes the algorithm perform well both on functions with many regularly or irregularly arranged local optima (the latter by frequently restarting with small populations). For the bipop parameter to actually take effect, also select nonzero number of (IPOP) restarts; the recommended setting is
restarts<=9
and x0 passed as a string using numpy.rand to generate initial solutions. Note that smallpopulation restarts do not count into the total restart count. AdaptSigmaTrue
Or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA
 CMA_activeTrue
Negative update, conducted after the original update
 CMA_activefac1
Learning rate multiplier for active update
 CMA_cmean1
Learning rate for the mean value
 CMA_const_traceFalse
Normalize trace, 1, True, “arithm”, “geom”, “aeig”, “geig” are valid
 CMA_diagonal0*100*N/popsize**0.5
Number of iterations with diagonal covariance matrix, True for always
 CMA_eigenmethodnp.linalg.eigh or cma.utilities.math.eig or pygsl.eigen.eigenvectors
 CMA_elitistFalse or “initial” or True
Elitism likely impairs global search performance
 CMA_injections_threshold_keep_len0
Keep length if Mahalanobis length is below the given relative threshold
 CMA_mirrorspopsize < 6
Values <0.5 are interpreted as fraction, values >1 as numbers (rounded), otherwise about 0.16 is used
 CMA_mirrormethodint, default 2, 0=unconditional, 1=selective, 2=selective with delay
 CMA_muNone
Parents selection parameter, default is popsize // 2
 CMA_on1
Multiplier for all covariance matrix updates
 CMA_samplerNone
 A class or instance that implements the interface of
cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass
 CMA_sampler_optionsdict
Options passed to CMA_sampler class init as keyword arguments
 CMA_rankmu1.0
Multiplier for rankmu update learning rate of covariance matrix
 CMA_rankone1.0
Multiplier for rankone update learning rate of covariance matrix
 CMA_recombination_weightsNone
A list, see class RecombinationWeights, overwrites CMA_mu and popsize options
 CMA_dampsvec_facnp.Inf
Tentative and subject to changes, 0.5 would be a “default” damping for sigma vector update
 CMA_dampsvec_fade0.1
Tentative fading out parameter for sigma vector update
 CMA_teststdsNone
Factors for nonisotropic initial distr. of C, mainly for test purpose, see CMA_stds for production
 CMA_stdsNone
Multipliers for sigma0 in each coordinate, not represented in C, makes scaling_of_variables obsolete
 CSA_dampfac1
Positive multiplier for stepsize damping, 0.3 is close to optimal on the sphere
 CSA_damp_mueff_exponent0.5
Zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option
 CSA_disregard_lengthFalse
True is untested, also changes respective parameters
 CSA_clip_length_valueNone
Poorly tested, [0, 0] means const length N**0.5, [1, 1] allows a variation of + N/(N+2), etc.
 CSA_squaredFalse
Use squared length for sigmaadaptation ‘,
 BoundaryHandlerBoundTransform or BoundPenalty, unused when
bounds in (None, [None, None])
 conditioncov_alleviate[1e8, 1e12]
When to alleviate the condition in the coordinates and in main axes
 eval_final_meanTrue
Evaluate the final mean, which is a favorite return candidate
 fixed_variablesNone
Dictionary with indexvalue pairs like dict(0=1.1, 2=0.1) that are not optimized
 ftargetinf
Target function value, minimization
 integer_variables[]
Index list, invokes basic integer handling: prevent std dev to become too small in the given variables
 maxfevalsinf
Maximum number of function evaluations
 maxiter100 + 150 * (N+3)**2 // popsize**0.5
Maximum number of iterations
 mean_shift_line_samplesFalse
Sample two new solutions colinear to previous mean shift
 mindx0
Minimal std in any arbitrary direction, cave interference with tol
 minstd0
Minimal std (scalar or vector) in any coordinate direction, cave interference with tol
 maxstdinf
Maximal std in any coordinate direction
 pc_line_samplesFalse
One line sample along the evolution path pc
 popsize4+int(3*np.log(N))
Population size, AKA lambda, number of new solution per iteration
 randnnp.random.randn
Randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss
 signals_filenameNone
cma_signals.in # read versatile options from this file which contains a single options dict, e.g.
dict("timeout"=0)
to stop, stringvalues are evaluated, e.g. “np.inf” is valid termination_callbackNone
A function returning True for termination, called in stop with self as argument, could be abused for side effects
 timeoutinf
Stop if timeout seconds are exceeded, the string “2.5 * 60**2” evaluates to 2 hours and 30 minutes
 tolconditioncov1e14
Stop if the condition of the covariance matrix is above tolconditioncov
 tolfacupx1e3
Termination when stepsize increases by tolfacupx (diverges). That is, the initial stepsize was chosen far too small and better solutions were found far away from the initial solution x0
 tolupsigma1e20
Sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates “creeping behavior” with usually minor improvements
 tolfun1e11
Termination criterion: tolerance in function value, quite useful
 tolfunhist1e12
Termination criterion: tolerance in function value history
 tolstagnationint(100 + 100 * N**1.5 / popsize)
Termination if no improvement over tolstagnation iterations
 tolx1e11
Termination criterion: tolerance in xchanges
 typical_xNone
Used with scaling_of_variables’,
 updatecovwaitNone
Number of iterations without distribution update, name is subject to future changes
 cmaes_verbose3
Verbosity e.g. of initial/final message, 1 is very quiet, 9 maximally quiet, may not be fully implemented
 verb_append0
Initial evaluation counter, if append, do not overwrite output files
 verb_disp100
Verbosity: display console output every verb_disp iteration
 verb_filenameprefixstr
CMADataLogger.default_prefix + Output path and filenames prefix
 verb_log1
Verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions
 verb_plot0
In fmin(): plot() is called every verb_plot iteration
 verb_timeTrue
Output timings on console
 vvdict
Versatile set or dictionary for hacking purposes, value found in self.opts[“vv”]
 kwargsdict
A dictionary with additional options passed to the constructor of class
CMAEvolutionStrategy
, seecma.CMAOptions
() for a list of available options.
Methods
initialize