GP Optimizer
Introduction
The Gaussian Process (GP) Optimizer implements Bayesian optimization using Gaussian processes as surrogate models. It's a sophisticated algorithm that builds a probabilistic model of the objective function and uses this model to make intelligent decisions about where to evaluate next.
About the Implementation
Gaussian Process optimization maintains a probabilistic model of the objective function and uses acquisition functions to balance exploration and exploitation. The GP provides both predictions and uncertainty estimates, making it ideal for expensive function evaluations.
Key features: - Uncertainty quantification: Provides confidence intervals for predictions - Sample efficiency: Excellent for expensive evaluations - Principled exploration: Uses uncertainty to guide search - Non-parametric: Adapts to complex function shapes
Parameters
Common (via Base Optuna Adapter)
param_space
(dict): parameter space; tuples/lists treated as ranges/choicesn_trials
(int): number of trials to runinitialize
(dict | None): optional warm start/grid/vertices/random initearly_stopping
(int | None): stop if no improvement after N trialsmax_score
(float | None): stop when reaching thresholdexperiment
(BaseExperiment): the experiment to optimize
n_startup_trials
- Type:
int
- Default:
10
- Description: Number of random trials before GP optimization starts
deterministic_objective
- Type:
bool
- Default:
False
- Description: Whether the objective function is deterministic (passes through to Optuna's GPSampler).
random_state
- Type:
int | None
- Default:
None
- Description: Seed for reproducibility (sets
seed
in the underlying GPSampler).
Usage Example
When to Use GP Optimizer
Best for:
- Expensive evaluations: When each evaluation takes significant time/resources
- Continuous parameters: Works best with real-valued parameters
- Smooth objectives: Most effective on smooth or moderately noisy functions
- Low to moderate dimensions: Typically <20 parameters
- Sample-efficient optimization: When you have limited evaluation budget
Consider alternatives if: - Many categorical parameters: TPE might be better - High dimensions: CMA-ES or TPE might scale better - Very noisy objectives: More robust methods might be needed - Cheap evaluations: Random search might be sufficient
Acquisition Functions
Note: The specific acquisition function is handled internally by the Optuna sampler used by this adapter and is not user-configurable via this API. The following concepts are provided for background only.
Expected Improvement (EI)
Balanced exploration-exploitation using improvement probability and magnitude.
Lower Confidence Bound (LCB)
Conservative trade-off between mean prediction and uncertainty.
Probability of Improvement (PI)
Focuses on points with high probability of improving over current best.
Advanced Usage
Custom Acquisition Function
Startup Trials Tuning
Comparison with Other Algorithms
Algorithm | Sample Efficiency | Continuous | Categorical | Scalability | Uncertainty |
---|---|---|---|---|---|
GP | Very High | Excellent | Limited | Poor (>20D) | Excellent |
TPE | High | Good | Excellent | Good | Good |
CMA-ES | High | Excellent | Poor | Good | None |
Random | Low | Good | Good | Excellent | None |
Mathematical Background
Gaussian Process regression assumes the objective function \(f\) follows a GP prior:
where: - \(\mu(x)\) is the mean function (often assumed to be 0) - \(k(x, x')\) is the covariance (kernel) function
Given observations \(\{(x_i, y_i)\}_{i=1}^n\), the posterior predictive distribution is:
The acquisition function uses both \(\mu_n(x)\) (predicted value) and \(\sigma_n(x)\) (uncertainty) to select the next evaluation point.
Performance Tips
- Parameter scaling: Normalize parameters to similar scales (0-1)
- Startup trials: Use 10-20% of total budget for random initialization
- Kernel choice: Default RBF kernel works well for most smooth functions
- Batch evaluation: GP optimization is inherently sequential
- Noise handling: Add noise parameter if objective is noisy
Common Use Cases
Neural Network Hyperparameters
Scientific Simulation Parameters
Model Regularization
Limitations
- Computational cost: GP inference scales as O(n³) with number of observations
- Categorical parameters: Not naturally handled (requires encoding)
- High dimensions: Performance degrades beyond ~20 parameters
- Non-stationary functions: Standard GP assumes stationarity
- Discrete parameters: Requires careful handling
Integration with Experimental Design
GP optimization naturally integrates with experimental design principles:
- Sequential design: Each evaluation informs the next
- Uncertainty quantification: Provides confidence in predictions
- Active learning: Focuses evaluations where learning is maximal
- Robust optimization: Can incorporate noise models
References
- Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian optimization of machine learning algorithms.
- Rasmussen, C. E., & Williams, C. K. (2006). Gaussian processes for machine learning.
- Optuna GP documentation: https://optuna.readthedocs.io/