CMA-ES¶
Disclaimer: We make use of the implementation available at PyPi [14] published by the author Nikolaus Hansen under the BSD license.
CMA-ES which was proposed in [15]. Moreover, a comparing review can be found in [16]. CMA-ES stands for covariance matrix adaptation evolution strategy. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value \(f(x)\). Like this, over the generation sequence, individuals with better and better \(f\)-values are generated. (excerpt from Wikipedia).
Example¶
[1]:
import numpy as np
from pymoo.algorithms.soo.nonconvex.cmaes import CMAES
from pymoo.problems import get_problem
from pymoo.optimize import minimize
problem = get_problem("sphere")
algorithm = CMAES(x0=np.random.random(problem.n_var))
res = minimize(problem,
algorithm,
seed=1,
verbose=False)
print(f"Best solution found: \nX = {res.X}\nF = {res.F}\nCV= {res.CV}")
Best solution found:
X = [0.50000004 0.50000004 0.50000002 0.49999997 0.49999999 0.5
0.5 0.49999996 0.50000001 0.49999999]
F = [5.87383434e-15]
CV= [0.]
CMA-ES already has several stopping criteria implemented. However, as for other algorithms, the number of iterations or function evaluations can be directly passed to minimize
.
[2]:
res = minimize(problem,
algorithm,
('n_iter', 10),
seed=1,
verbose=True)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
==================================================================================================================
n_gen | n_eval | f_avg | f_min | f_gap | sigma | min_std | max_std | axis
==================================================================================================================
1 | 1 | 0.7843602932 | 0.7843602932 | 0.7843602932 | 0.1000000000 | 0.10000 | 0.10000 | 1.00005
2 | 11 | 0.8273016598 | 0.6419345370 | 0.6419345370 | 0.0895404106 | 0.08551 | 0.09061 | 1.14007
3 | 21 | 0.7148381443 | 0.5766198911 | 0.5766198911 | 0.0858843011 | 0.07883 | 0.08940 | 1.25399
4 | 31 | 0.5897903198 | 0.4440676935 | 0.4440676935 | 0.0830931628 | 0.07160 | 0.08454 | 1.31305
5 | 41 | 0.5587528308 | 0.3774923663 | 0.3774923663 | 0.0836171300 | 0.07126 | 0.08552 | 1.44245
6 | 51 | 0.5013770120 | 0.2666919669 | 0.2666919669 | 0.0935926085 | 0.07879 | 0.10706 | 1.64210
7 | 61 | 0.3848830308 | 0.2090337198 | 0.2090337198 | 0.1052746387 | 0.08984 | 0.12289 | 1.81420
8 | 71 | 0.2995043839 | 0.1717378759 | 0.1717378759 | 0.1217307550 | 0.10382 | 0.14580 | 1.95364
9 | 81 | 0.3048011131 | 0.0750448175 | 0.0750448175 | 0.1322873779 | 0.11398 | 0.15948 | 1.88633
10 | 91 | 0.2347273099 | 0.0563269780 | 0.0563269780 | 0.1336602889 | 0.11262 | 0.15600 | 1.89057
Best solution found:
X = [0.37031907 0.50399673 0.39425702 0.4039654 0.45149006 0.43487122
0.42808061 0.51360982 0.42751342 0.54339051]
F = [0.05632698]
[3]:
res = minimize(problem,
algorithm,
('n_eval', 50),
seed=1,
verbose=True)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
==================================================================================================================
n_gen | n_eval | f_avg | f_min | f_gap | sigma | min_std | max_std | axis
==================================================================================================================
1 | 1 | 0.7843602932 | 0.7843602932 | 0.7843602932 | 0.1000000000 | 0.10000 | 0.10000 | 1.00005
2 | 11 | 0.8273016598 | 0.6419345370 | 0.6419345370 | 0.0895404106 | 0.08551 | 0.09061 | 1.14007
3 | 21 | 0.7148381443 | 0.5766198911 | 0.5766198911 | 0.0858843011 | 0.07883 | 0.08940 | 1.25399
4 | 31 | 0.5897903198 | 0.4440676935 | 0.4440676935 | 0.0830931628 | 0.07160 | 0.08454 | 1.31305
5 | 41 | 0.5587528308 | 0.3774923663 | 0.3774923663 | 0.0836171300 | 0.07126 | 0.08552 | 1.44245
6 | 51 | 0.5013770120 | 0.2666919669 | 0.2666919669 | 0.0935926085 | 0.07879 | 0.10706 | 1.64210
Best solution found:
X = [0.34068406 0.31552931 0.41656912 0.31965948 0.55642895 0.29095006
0.27616948 0.26658408 0.50452182 0.3722976 ]
F = [0.26669197]
Also, easily restarts can be used, which are known to work very well on multi-modal functions. For instance, Rastrigin
can be solved rather quickly by:
[4]:
problem = get_problem("rastrigin")
algorithm = CMAES(restarts=10, restart_from_best=True)
res = minimize(problem,
algorithm,
('n_evals', 2500),
seed=1,
verbose=False)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
Best solution found:
X = [-1.22627242e-10 2.16614460e-09]
F = [0.]
Our framework internally calls the cma.fmin2
function. All parameters that can be used there either as a keyword argument or an option can also be passed to the CMAES
constructor. An example with a few selected cma.fmin2
parameters is shown below:
[5]:
import numpy as np
from pymoo.util.normalization import denormalize
# define an intitial point for the search
np.random.seed(1)
x0 = denormalize(np.random.random(problem.n_var), problem.xl, problem.xu)
algorithm = CMAES(x0=x0,
sigma=0.5,
restarts=2,
maxfevals=np.inf,
tolfun=1e-6,
tolx=1e-6,
restart_from_best=True,
bipop=True)
res = minimize(problem,
algorithm,
seed=1,
verbose=False)
print("Best solution found: \nX = %s\nF = %s" % (res.X, res.F))
Best solution found:
X = [2.73861817e-07 3.30861369e-07]
F = [3.65965036e-11]
For more details about hyperparameters, we refer to the software documentation of the fmin2
in CMA-ES, which can be found here. A quick explanation of possible parameters is also provided in the API documentation below.
API¶
-
class
pymoo.algorithms.soo.nonconvex.cmaes.
CMAES
(self, x0=None, sigma=0.1, normalize=True, parallelize=True, maxfevals=np.inf, tolfun=1e-11, tolx=1e-11, restarts=0, restart_from_best='False', incpopsize=2, eval_initial_x=False, noise_handler=None, noise_change_sigma_exponent=1, noise_kappa_exponent=0, bipop=False, cmaes_verbose=- 9, verb_log=0, output=CMAESOutput(), pop_size=None, **kwargs) - Parameters
- x0list or numpy.ndarray
initial guess of minimum solution before the application of the geno-phenotype transformation according to the
transformation
option. It can also be a string holding a Python expression that is evaluated to yield the initial guess - this is important in case restarts are performed so that they start from different places. Otherwisex0
can also be a cma.CMAEvolutionStrategy object instance, in that casesigma0
can beNone
.- sigmafloat
Initial standard deviation in each coordinate.
sigma0
should be about 1/4th of the search domain width (where the optimum is to be expected). The variables inobjective_function
should be scaled such that they presumably have similar sensitivity. See also ScaleCoordinates.- parallelizebool
Whether the objective function should be called for each single evaluation or batch wise.
- restartsint, default 0
Number of restarts with increasing population size, see also parameter
incpopsize
, implementing the IPOP-CMA-ES restart strategy, see also parameterbipop
; to restart from different points (recommended), passx0
as a string.- restart_from_bestbool, default false
Which point to restart from
- incpopsizeint
Multiplier for increasing the population size
popsize
before each restart- eval_initial_xbool
Evaluate initial solution, for
None
only with elitist option- noise_handlerclass
A
NoiseHandler
class or instance orNone
. Example:cma.fmin(f, 6 * [1], 1, noise_handler=cma.NoiseHandler(6))
seehelp(cma.NoiseHandler)
.- noise_change_sigma_exponentint
Exponent for the sigma increment provided by the noise handler for additional noise treatment. 0 means no sigma change.
- noise_kappa_exponentint
Instead of applying reevaluations, the “number of evaluations” is (ab)used as init_simplex_scale factor kappa (experimental).
- bipopbool
If True, run as BIPOP-CMA-ES; BIPOP is a special restart strategy switching between two population sizings - small (like the default CMA, but with more focused search) and large (progressively increased as in IPOP). This makes the algorithm perform well both on functions with many regularly or irregularly arranged local optima (the latter by frequently restarting with small populations). For the bipop parameter to actually take effect, also select non-zero number of (IPOP) restarts; the recommended setting is
restarts<=9
and x0 passed as a string using numpy.rand to generate initial solutions. Note that small-population restarts do not count into the total restart count.- AdaptSigmaTrue
Or False or any CMAAdaptSigmaBase class e.g. CMAAdaptSigmaTPA, CMAAdaptSigmaCSA
- CMA_activeTrue
Negative update, conducted after the original update
- CMA_activefac1
Learning rate multiplier for active update
- CMA_cmean1
Learning rate for the mean value
- CMA_const_traceFalse
Normalize trace, 1, True, “arithm”, “geom”, “aeig”, “geig” are valid
- CMA_diagonal0*100*N/popsize**0.5
Number of iterations with diagonal covariance matrix, True for always
- CMA_eigenmethodnp.linalg.eigh or cma.utilities.math.eig or pygsl.eigen.eigenvectors
- CMA_elitistFalse or “initial” or True
Elitism likely impairs global search performance
- CMA_injections_threshold_keep_len0
Keep length if Mahalanobis length is below the given relative threshold
- CMA_mirrorspopsize < 6
Values <0.5 are interpreted as fraction, values >1 as numbers (rounded), otherwise about 0.16 is used
- CMA_mirrormethodint, default 2, 0=unconditional, 1=selective, 2=selective with delay
- CMA_muNone
Parents selection parameter, default is popsize // 2
- CMA_on1
Multiplier for all covariance matrix updates
- CMA_samplerNone
- A class or instance that implements the interface of
cma.interfaces.StatisticalModelSamplerWithZeroMeanBaseClass
- CMA_sampler_optionsdict
Options passed to CMA_sampler class init as keyword arguments
- CMA_rankmu1.0
Multiplier for rank-mu update learning rate of covariance matrix
- CMA_rankone1.0
Multiplier for rank-one update learning rate of covariance matrix
- CMA_recombination_weightsNone
A list, see class RecombinationWeights, overwrites CMA_mu and popsize options
- CMA_dampsvec_facnp.Inf
Tentative and subject to changes, 0.5 would be a “default” damping for sigma vector update
- CMA_dampsvec_fade0.1
Tentative fading out parameter for sigma vector update
- CMA_teststdsNone
Factors for non-isotropic initial distr. of C, mainly for test purpose, see CMA_stds for production
- CMA_stdsNone
Multipliers for sigma0 in each coordinate, not represented in C, makes scaling_of_variables obsolete
- CSA_dampfac1
Positive multiplier for step-size damping, 0.3 is close to optimal on the sphere
- CSA_damp_mueff_exponent0.5
Zero would mean no dependency of damping on mueff, useful with CSA_disregard_length option
- CSA_disregard_lengthFalse
True is untested, also changes respective parameters
- CSA_clip_length_valueNone
Poorly tested, [0, 0] means const length N**0.5, [-1, 1] allows a variation of +- N/(N+2), etc.
- CSA_squaredFalse
Use squared length for sigma-adaptation ‘,
- BoundaryHandlerBoundTransform or BoundPenalty, unused when
bounds in (None, [None, None])
- conditioncov_alleviate[1e8, 1e12]
When to alleviate the condition in the coordinates and in main axes
- eval_final_meanTrue
Evaluate the final mean, which is a favorite return candidate
- fixed_variablesNone
Dictionary with index-value pairs like dict(0=1.1, 2=0.1) that are not optimized
- ftarget-inf
Target function value, minimization
- integer_variables[]
Index list, invokes basic integer handling: prevent std dev to become too small in the given variables
- maxfevalsinf
Maximum number of function evaluations
- maxiter100 + 150 * (N+3)**2 // popsize**0.5
Maximum number of iterations
- mean_shift_line_samplesFalse
Sample two new solutions colinear to previous mean shift
- mindx0
Minimal std in any arbitrary direction, cave interference with tol
- minstd0
Minimal std (scalar or vector) in any coordinate direction, cave interference with tol
- maxstdinf
Maximal std in any coordinate direction
- pc_line_samplesFalse
One line sample along the evolution path pc
- popsize4+int(3*np.log(N))
Population size, AKA lambda, number of new solution per iteration
- randnnp.random.randn
Randn(lam, N) must return an np.array of shape (lam, N), see also cma.utilities.math.randhss
- signals_filenameNone
cma_signals.in # read versatile options from this file which contains a single options dict, e.g.
dict("timeout"=0)
to stop, string-values are evaluated, e.g. “np.inf” is valid- termination_callbackNone
A function returning True for termination, called in stop with self as argument, could be abused for side effects
- timeoutinf
Stop if timeout seconds are exceeded, the string “2.5 * 60**2” evaluates to 2 hours and 30 minutes
- tolconditioncov1e14
Stop if the condition of the covariance matrix is above tolconditioncov
- tolfacupx1e3
Termination when step-size increases by tolfacupx (diverges). That is, the initial step-size was chosen far too small and better solutions were found far away from the initial solution x0
- tolupsigma1e20
Sigma/sigma0 > tolupsigma * max(eivenvals(C)**0.5) indicates “creeping behavior” with usually minor improvements
- tolfun1e-11
Termination criterion: tolerance in function value, quite useful
- tolfunhist1e-12
Termination criterion: tolerance in function value history
- tolstagnationint(100 + 100 * N**1.5 / popsize)
Termination if no improvement over tolstagnation iterations
- tolx1e-11
Termination criterion: tolerance in x-changes
- typical_xNone
Used with scaling_of_variables’,
- updatecovwaitNone
Number of iterations without distribution update, name is subject to future changes
- cmaes_verbose3
Verbosity e.g. of initial/final message, -1 is very quiet, -9 maximally quiet, may not be fully implemented
- verb_append0
Initial evaluation counter, if append, do not overwrite output files
- verb_disp100
Verbosity: display console output every verb_disp iteration
- verb_filenameprefixstr
CMADataLogger.default_prefix + Output path and filenames prefix
- verb_log1
Verbosity: write data to files every verb_log iteration, writing can be time critical on fast to evaluate functions
- verb_plot0
In fmin(): plot() is called every verb_plot iteration
- verb_timeTrue
Output timings on console
- vvdict
Versatile set or dictionary for hacking purposes, value found in self.opts[“vv”]
- kwargsdict
A dictionary with additional options passed to the constructor of class
CMAEvolutionStrategy
, seecma.CMAOptions
() for a list of available options.