Skip to content

Tree Structured Parzen Estimators

Introduction

Tree of Parzen Estimators chooses new positions by calculating an acquisition function. It assesses all possible positions by calculating the ratio of their probability being among the best positions and the worst positions. Those probabilities are determined with a kernel density estimator, which is trained on already evaluated positions.

Example

from hyperactive import Hyperactive
from hyperactive.optimizers import TreeStructuredParzenEstimators

...

optimizer = TreeStructuredParzenEstimators(gamma_tpe=0.1)

hyper = Hyperactive()
hyper.add_search(model, search_space, n_iter=50, optimizer=optimizer)
hyper.run()

About the implementation

Similar to other sequence-model-based optimization algorithms the positions and scores of previous evaluations are saved as features and targets to train a machine learning algorithm. For the tree structured parzen estimators we use two separate kernel density estimators one receiving the best positions, while the other gets the worst positions to calculate the acquisition function.

The separation of the position into best and worst is very simple. First we sort the positions by their scores. We split this list at the following index:

\[ i_{split} = \max( n_{samples} \gamma, 1) \]

where:

  • \(\gamma\) => gamma_tpe
  • \(n_{samples}\) => the number of positions in the list

Parameters

gamma_tpe

This parameter determines the separation of the explored positions into good and bad. It must be in the range between 0 and 1. A value of 0.2 means, that the best \(20\%\) of the known positions are put into the list of best known positions, while the rest is put into the list of worst known positions.

  • type: float
  • default: 0.2
  • typical range: 0.05 ... 0.75

max_sample_size

The max_sample_size is a first pass of randomly sampling, before all possible positions are generated for the sequence-model-based optimization. It samples the search space directly and takes effect if the search-space is very large:

search_data = {
  "x1": np.arange(0, 1000, 0.01),
  "x2": np.arange(0, 1000, 0.01),
  "x3": np.arange(0, 1000, 0.01),
  "x4": np.arange(0, 1000, 0.01),
}

The max_sample_size-parameter is necessary to avoid a memory overload from generating all possible positions from the search-space. The search-space above corresponds to a list of \(100000^4 = 100000000000000000000\) numpy arrays. This memory overload is expected for a sequence-model-based optimization algorithm, because the surrogate model has the job make a prediction for every position in the search-space to calculate the acquisition-function. The max_sample_size-parameter was introduced to provide a better out-of-the-box experience if using smb-optimizers.

  • type: int
  • default: 10000000
  • typical range: 1000000 ... 100000000

sampling

The sampling-parameter is a second pass of randomly sampling. It samples from the list of all possible positions (not directly from the search-space). This might be necessary, because the predict-method of the surrogate model could overload the memory.

  • type: dict
  • default: {'random': 1000000}
  • typical range: -

warm_start_smbo

The warm_start_smbo-parameter is a pandas dataframe that contains search-data with the results from a previous optimization run. The dataframe containing the search-data could look like this:

x1 x2 score
5 15 0.3
10 12 0.7
... ... ...
... ... ...

Where the corresponding search-space would look like this:

search_space = {
  "x1": np.arange(0, 20),
  "x2": np.arange(0, 20),
}

Before passing the search-data to the optimizer make sure, that the columns match the search-space of the new optimization run. So you could not add another dimension ("x3") to the search-space and expect the warm-start to work. The dimensionality of the optimization must be preserved and fit the problem.

opt = BayesianOptimization(warm_start_smbo=search_data)
  • type: pandas dataframe, None
  • default: None
  • possible values: -

rand_rest_p

Probability for the optimization algorithm to jump to a random position in an iteration step. It is set to 0 per default. The idea of this parameter is to give the possibility to inject randomness into algorithms that don't normally support it.

  • type: float
  • default: 0
  • typical range: 0.01 ... 0.1