Subset Selection ProblemΒΆ
A genetic algorithm can be used to approach subset selection problems by defining custom operators. In general, a metaheuristic algorithm might not be the ultimate goal to implement in a real-world scenario; however, it might be useful to investigate patterns or characteristics of possible well-performing subsets. Let us consider a simple toy problem where we have to select numbers from a list. For every solution, exactly ten numbers have to be selected that their sum is minimized. For the subset selection problem, a binary encoding can be used where one indicates a number is picked. In our problem formulation, the list of numbers is represented by \(L\) and the binary encoded variable by \(x\).
\begin{align} \begin{split} \min f(x) & = & \sum_{k=1}^{n} L_k \cdot x_k\\[2mm] \text{s.t.} \quad g(x) & = & (\sum_{k=1}^{n} x_k - 10)^2\\[2mm] \end{split} \end{align}
As shown above, the equality constraint is handled by ensuring \(g(x)\) can only be zero if exactly ten numbers are chosen. The problem can be implemented as follows:
[1]:
import numpy as np
from pymoo.core.problem import ElementwiseProblem
class SubsetProblem(ElementwiseProblem):
def __init__(self,
L,
n_max
):
super().__init__(n_var=len(L), n_obj=1, n_ieq_constr=1)
self.L = L
self.n_max = n_max
def _evaluate(self, x, out, *args, **kwargs):
out["F"] = np.sum(self.L[x])
out["G"] = (self.n_max - np.sum(x)) ** 2
# create the actual problem to be solved
np.random.seed(1)
L = np.array([np.random.randint(100) for _ in range(100)])
n_max = 10
problem = SubsetProblem(L, n_max)
The customization requires writing custom operators in order to solve this problem efficiently. We recommend considering the feasibility directly in the evolutionary operators because otherwise, most of the time, infeasible solutions will be processed. The sampling creates a random solution where the subset constraint will always be satisfied. The mutation randomly removes a number and chooses another one. The crossover takes the values of both parents and then randomly picks either the one from the first or from the second parent until enough numbers are picked.
[2]:
from pymoo.core.crossover import Crossover
from pymoo.core.mutation import Mutation
from pymoo.core.sampling import Sampling
class MySampling(Sampling):
def _do(self, problem, n_samples, **kwargs):
X = np.full((n_samples, problem.n_var), False, dtype=bool)
for k in range(n_samples):
I = np.random.permutation(problem.n_var)[:problem.n_max]
X[k, I] = True
return X
class BinaryCrossover(Crossover):
def __init__(self):
super().__init__(2, 1)
def _do(self, problem, X, **kwargs):
n_parents, n_matings, n_var = X.shape
_X = np.full((self.n_offsprings, n_matings, problem.n_var), False)
for k in range(n_matings):
p1, p2 = X[0, k], X[1, k]
both_are_true = np.logical_and(p1, p2)
_X[0, k, both_are_true] = True
n_remaining = problem.n_max - np.sum(both_are_true)
I = np.where(np.logical_xor(p1, p2))[0]
S = I[np.random.permutation(len(I))][:n_remaining]
_X[0, k, S] = True
return _X
class MyMutation(Mutation):
def _do(self, problem, X, **kwargs):
for i in range(X.shape[0]):
X[i, :] = X[i, :]
is_false = np.where(np.logical_not(X[i, :]))[0]
is_true = np.where(X[i, :])[0]
X[i, np.random.choice(is_false)] = True
X[i, np.random.choice(is_true)] = False
return X
After having defined the operators a genetic algorithm can be initialized.
[3]:
from pymoo.algorithms.soo.nonconvex.ga import GA
from pymoo.optimize import minimize
algorithm = GA(
pop_size=100,
sampling=MySampling(),
crossover=BinaryCrossover(),
mutation=MyMutation(),
eliminate_duplicates=True)
res = minimize(problem,
algorithm,
('n_gen', 60),
seed=1,
verbose=True)
print("Function value: %s" % res.F[0])
print("Subset:", np.where(res.X)[0])
=================================================================================
n_gen | n_eval | cv_min | cv_avg | f_avg | f_min
=================================================================================
1 | 100 | 0.000000E+00 | 0.000000E+00 | 4.439400E+02 | 2.580000E+02
2 | 200 | 0.000000E+00 | 0.000000E+00 | 3.495200E+02 | 2.040000E+02
3 | 300 | 0.000000E+00 | 0.000000E+00 | 3.070100E+02 | 1.880000E+02
4 | 400 | 0.000000E+00 | 0.000000E+00 | 2.683900E+02 | 1.650000E+02
5 | 500 | 0.000000E+00 | 0.000000E+00 | 2.312200E+02 | 1.340000E+02
6 | 600 | 0.000000E+00 | 0.000000E+00 | 2.046400E+02 | 1.280000E+02
7 | 700 | 0.000000E+00 | 0.000000E+00 | 1.805100E+02 | 1.170000E+02
8 | 800 | 0.000000E+00 | 0.000000E+00 | 1.600100E+02 | 9.900000E+01
9 | 900 | 0.000000E+00 | 0.000000E+00 | 1.431800E+02 | 8.600000E+01
10 | 1000 | 0.000000E+00 | 0.000000E+00 | 1.300900E+02 | 7.800000E+01
11 | 1100 | 0.000000E+00 | 0.000000E+00 | 1.180200E+02 | 7.800000E+01
12 | 1200 | 0.000000E+00 | 0.000000E+00 | 1.104100E+02 | 7.700000E+01
13 | 1300 | 0.000000E+00 | 0.000000E+00 | 1.047700E+02 | 7.000000E+01
14 | 1400 | 0.000000E+00 | 0.000000E+00 | 9.968000E+01 | 7.000000E+01
15 | 1500 | 0.000000E+00 | 0.000000E+00 | 9.599000E+01 | 6.500000E+01
16 | 1600 | 0.000000E+00 | 0.000000E+00 | 9.186000E+01 | 5.800000E+01
17 | 1700 | 0.000000E+00 | 0.000000E+00 | 8.930000E+01 | 5.800000E+01
18 | 1800 | 0.000000E+00 | 0.000000E+00 | 8.628000E+01 | 5.800000E+01
19 | 1900 | 0.000000E+00 | 0.000000E+00 | 8.270000E+01 | 5.700000E+01
20 | 2000 | 0.000000E+00 | 0.000000E+00 | 7.919000E+01 | 5.400000E+01
21 | 2100 | 0.000000E+00 | 0.000000E+00 | 7.606000E+01 | 5.100000E+01
22 | 2200 | 0.000000E+00 | 0.000000E+00 | 7.372000E+01 | 5.100000E+01
23 | 2300 | 0.000000E+00 | 0.000000E+00 | 7.201000E+01 | 5.100000E+01
24 | 2400 | 0.000000E+00 | 0.000000E+00 | 7.003000E+01 | 5.100000E+01
25 | 2500 | 0.000000E+00 | 0.000000E+00 | 6.804000E+01 | 4.900000E+01
26 | 2600 | 0.000000E+00 | 0.000000E+00 | 6.613000E+01 | 4.900000E+01
27 | 2700 | 0.000000E+00 | 0.000000E+00 | 6.472000E+01 | 4.900000E+01
28 | 2800 | 0.000000E+00 | 0.000000E+00 | 6.356000E+01 | 4.900000E+01
29 | 2900 | 0.000000E+00 | 0.000000E+00 | 6.168000E+01 | 4.100000E+01
30 | 3000 | 0.000000E+00 | 0.000000E+00 | 6.109000E+01 | 4.100000E+01
31 | 3100 | 0.000000E+00 | 0.000000E+00 | 6.007000E+01 | 4.100000E+01
32 | 3200 | 0.000000E+00 | 0.000000E+00 | 5.863000E+01 | 4.100000E+01
33 | 3300 | 0.000000E+00 | 0.000000E+00 | 5.753000E+01 | 4.000000E+01
34 | 3400 | 0.000000E+00 | 0.000000E+00 | 5.675000E+01 | 4.000000E+01
35 | 3500 | 0.000000E+00 | 0.000000E+00 | 5.548000E+01 | 3.900000E+01
36 | 3600 | 0.000000E+00 | 0.000000E+00 | 5.505000E+01 | 3.900000E+01
37 | 3700 | 0.000000E+00 | 0.000000E+00 | 5.423000E+01 | 3.900000E+01
38 | 3800 | 0.000000E+00 | 0.000000E+00 | 5.329000E+01 | 3.900000E+01
39 | 3900 | 0.000000E+00 | 0.000000E+00 | 5.210000E+01 | 3.900000E+01
40 | 4000 | 0.000000E+00 | 0.000000E+00 | 5.071000E+01 | 3.900000E+01
41 | 4100 | 0.000000E+00 | 0.000000E+00 | 5.021000E+01 | 3.900000E+01
42 | 4200 | 0.000000E+00 | 0.000000E+00 | 4.906000E+01 | 3.900000E+01
43 | 4300 | 0.000000E+00 | 0.000000E+00 | 4.808000E+01 | 3.900000E+01
44 | 4400 | 0.000000E+00 | 0.000000E+00 | 4.789000E+01 | 3.900000E+01
45 | 4500 | 0.000000E+00 | 0.000000E+00 | 4.756000E+01 | 3.900000E+01
46 | 4600 | 0.000000E+00 | 0.000000E+00 | 4.718000E+01 | 3.900000E+01
47 | 4700 | 0.000000E+00 | 0.000000E+00 | 4.672000E+01 | 3.900000E+01
48 | 4800 | 0.000000E+00 | 0.000000E+00 | 4.621000E+01 | 3.900000E+01
49 | 4900 | 0.000000E+00 | 0.000000E+00 | 4.583000E+01 | 3.800000E+01
50 | 5000 | 0.000000E+00 | 0.000000E+00 | 4.545000E+01 | 3.800000E+01
51 | 5100 | 0.000000E+00 | 0.000000E+00 | 4.497000E+01 | 3.800000E+01
52 | 5200 | 0.000000E+00 | 0.000000E+00 | 4.482000E+01 | 3.800000E+01
53 | 5300 | 0.000000E+00 | 0.000000E+00 | 4.473000E+01 | 3.800000E+01
54 | 5400 | 0.000000E+00 | 0.000000E+00 | 4.458000E+01 | 3.700000E+01
55 | 5500 | 0.000000E+00 | 0.000000E+00 | 4.437000E+01 | 3.700000E+01
56 | 5600 | 0.000000E+00 | 0.000000E+00 | 4.409000E+01 | 3.700000E+01
57 | 5700 | 0.000000E+00 | 0.000000E+00 | 4.394000E+01 | 3.700000E+01
58 | 5800 | 0.000000E+00 | 0.000000E+00 | 4.385000E+01 | 3.700000E+01
59 | 5900 | 0.000000E+00 | 0.000000E+00 | 4.374000E+01 | 3.700000E+01
60 | 6000 | 0.000000E+00 | 0.000000E+00 | 4.360000E+01 | 3.700000E+01
Function value: 37.0
Subset: [ 5 9 12 31 36 37 40 47 52 99]
Finally, we can compare the found subset with the optimum known simply through sorting:
[4]:
opt = np.sort(np.argsort(L)[:n_max])
print("Optimal Subset:", opt)
print("Optimal Function Value: %s" % L[opt].sum())
Optimal Subset: [ 5 9 12 31 36 37 47 52 68 99]
Optimal Function Value: 36