TPE Optimizer
Introduction
The Tree-structured Parzen Estimator (TPE) Optimizer is a Bayesian optimization algorithm that uses Parzen estimators to model the objective function. It's one of the most popular and effective algorithms for hyperparameter optimization, particularly well-suited for expensive function evaluations.
About the Implementation
TPE works by building probabilistic models of the objective function using historical evaluation data. It separates the observations into "good" and "bad" groups based on a quantile, then models each group separately using Parzen estimators (kernel density estimation). The algorithm selects new points by maximizing the Expected Improvement criterion.
Key features: - Adaptive modeling: Builds better models as more data is collected - Categorical support: Handles mixed parameter spaces naturally - Robust performance: Works well across many different optimization problems - Sample efficient: Requires fewer evaluations than random search
Parameters
Common (via Base Optuna Adapter)
param_space
(dict): parameter space; tuples/lists treated as ranges/choicesn_trials
(int): number of trials to runinitialize
(dict | None): optional warm start/grid/vertices/random initearly_stopping
(int | None): stop if no improvement after N trialsmax_score
(float | None): stop when reaching thresholdexperiment
(BaseExperiment): the experiment to optimize
n_startup_trials
- Type:
int
- Default:
10
- Description: Number of random trials before TPE starts. These initial random samples help build the initial model.
n_ei_candidates
- Type:
int
- Default:
24
- Description: Number of candidate points to evaluate when computing Expected Improvement
weights
- Type:
callable | None
- Default:
None
- Description: Optional weighting function passed to Optuna's TPESampler.
random_state
- Type:
int | None
- Default:
None
- Description: Seed for reproducibility (sets
seed
in the underlying TPESampler).
Usage Example
When to Use TPE
Best for: - Mixed parameter spaces: Handles continuous, discrete, and categorical parameters - Moderate evaluation budgets: Works well with 50-500 evaluations - Expensive function evaluations: Sample-efficient compared to grid/random search - General-purpose optimization: Robust across many problem types
Consider alternatives if: - Very high dimensions: May struggle with >50 parameters - Very cheap evaluations: Random search might be sufficient - Specific problem structure: Specialized algorithms might be better
Comparison with Other Algorithms
Algorithm | Sample Efficiency | Parameter Types | Computational Cost |
---|---|---|---|
TPE | High | All types | Medium |
Random Search | Low | All types | Low |
Bayesian Opt | High | Mostly continuous | High |
Grid Search | Low | Discrete only | Low |
Advanced Usage
Custom Gamma Values
Warm Starting
Performance Tips
- Start with defaults: TPE's default parameters work well for most problems
- Adjust gamma: Use smaller gamma (0.1-0.2) for exploitation, larger (0.3-0.4) for exploration
- Scale startup trials: Use 10-20 startup trials for most problems
- Parameter space design: Keep parameter spaces reasonably sized (each dimension <100 values)
References
- Bergstra, J., Bardenet, R., Bengio, Y., & Kégl, B. (2011). Algorithms for hyper-parameter optimization.
- Optuna Documentation: https://optuna.readthedocs.io/