Bayesian Optimization
Introduction
Bayesian optimization chooses new positions by calculating the expected improvement of every position in the search space based on a gaussian process that trains on already evaluated positions.
Example
About the implementation
The bayesian optimizer collects the information about the position and score in each iteration. The gaussian process regressor fits to the position (features) and score (target), and predicts the scores of all unknown positions. This is why the bayesian optimization needs at least one initial position. The gaussian process returns the standard deviation in addition to the prediction (or mean), both of which are required to compute the acquisition function. The position of the best predicted score is evaluated next. The selected position and its true score is then collected, restarting the cycle. The acquisition function used in this algorithm is the expected improvement. The expected improvement is calculated by the following equation:
where:
and:
- \(y_{sample, max}\) => best known score
- \(\xi\) => xi-parameter
- \(\varphi\) => Probability density function
- \(\Phi\) => Cumulative distribution function
The surrogate model used in bayesian optimization is the gassian process regressor. A crucial property of this model is, that it returns the uncertainty of the prediction \(\sigma\) together with the predicted value \(\mu\).
Parameters
xi
Parameter for the expected uncertainty of the estimation. It is a parameter that belongs to the expected-improvement acquisition-function.
- type: float
- default: 0.3
- typical range: 0.1 ... 0.9
gpr
The access to the surrogate model. To pass a surrogate model it must be similar to the following code:
class GPR:
def __init__(self):
self.gpr = GaussianProcessRegressor()
def fit(self, X, y):
self.gpr.fit(X, y)
def predict(self, X, return_std=False):
return self.gpr.predict(X, return_std=return_std)
The predict
-method returns only \(\mu\) if return_std=False
and returns \(\mu\) and \(\sigma\) if return_std=True
. Note that you have to pass the instantiated class to the gpr
-parameter:
- type: class
- default: -
- possible values: -
max_sample_size
The max_sample_size
is a first pass of randomly sampling, before all possible positions are generated for the sequence-model-based optimization. It samples the search space directly and takes effect if the search-space is very large:
search_data = {
"x1": np.arange(0, 1000, 0.01),
"x2": np.arange(0, 1000, 0.01),
"x3": np.arange(0, 1000, 0.01),
"x4": np.arange(0, 1000, 0.01),
}
The max_sample_size
-parameter is necessary to avoid a memory overload from generating all possible positions from the search-space. The search-space above corresponds to a list of \(100000^4 = 100000000000000000000\) numpy arrays. This memory overload is expected for a sequence-model-based optimization algorithm, because the surrogate model has the job make a prediction for every position in the search-space to calculate the acquisition-function. The max_sample_size
-parameter was introduced to provide a better out-of-the-box experience if using smb-optimizers.
- type: int
- default: 10000000
- typical range: 1000000 ... 100000000
sampling
The sampling
-parameter is a second pass of randomly sampling. It samples from the list of all possible positions (not directly from the search-space). This might be necessary, because the predict
-method of the surrogate model could overload the memory.
- type: dict
- default: {'random': 1000000}
- typical range: -
warm_start_smbo
The warm_start_smbo
-parameter is a pandas dataframe that contains search-data with the results from a previous optimization run. The dataframe containing the search-data could look like this:
x1 | x2 | score |
---|---|---|
5 | 15 | 0.3 |
10 | 12 | 0.7 |
... | ... | ... |
... | ... | ... |
Where the corresponding search-space would look like this:
Before passing the search-data to the optimizer make sure, that the columns match the search-space of the new optimization run. So you could not add another dimension ("x3") to the search-space and expect the warm-start to work. The dimensionality of the optimization must be preserved and fit the problem.
- type: pandas dataframe, None
- default: None
- possible values: -
rand_rest_p
Probability for the optimization algorithm to jump to a random position in an iteration step. It is set to 0 per default. The idea of this parameter is to give the possibility to inject randomness into algorithms that don't normally support it.
- type: float
- default: 0
- typical range: 0.01 ... 0.1