CMA-ES Optimizer
Introduction
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) is a state-of-the-art evolutionary algorithm for continuous optimization problems. It's particularly effective for non-convex, multimodal optimization landscapes and is considered one of the best general-purpose continuous optimizers available.
About the Implementation
CMA-ES maintains a multivariate normal distribution of candidate solutions and adapts both the mean and covariance matrix of this distribution based on the success of previous generations. The algorithm excels at handling:
- Ill-conditioned problems: Automatically adapts to the problem's scale and correlation structure
- Multimodal landscapes: Can escape local optima through its population-based approach
- Noisy objectives: Robust to evaluation noise through population sampling
Key features: - Scale-invariant: Adapts to different parameter scales automatically - Rotation-invariant: Learns parameter correlations and dependencies - Self-adaptive: No manual parameter tuning required for most problems - Proven convergence: Strong theoretical foundation with convergence guarantees
Parameters
experiment
- Type:
BaseExperiment
- Description: The experiment object defining the optimization problem
sigma0
- Type:
float
- Default:
1.0
- Description: Initial step size (standard deviation). Should be roughly 1/4 to 1/2 of the search domain width.
population_size
- Type:
int
orNone
- Default:
None
(auto-calculated) - Description: Population size. If None, uses
4 + floor(3 * log(n_dimensions))
seed
- Type:
int
orNone
- Default:
None
- Description: Random seed for reproducible results
Usage Example
When to Use CMA-ES
Best for: - Continuous optimization: Excels with real-valued parameters - Expensive evaluations: Very sample-efficient for complex landscapes - Unknown problem structure: Adapts automatically to problem characteristics - Multimodal problems: Can escape local optima effectively - Medium to high dimensions: Scales well up to 100+ dimensions
Consider alternatives if: - Discrete/categorical parameters: Use TPE or other discrete optimizers - Very cheap evaluations: Random search might be sufficient - Low dimensions (<5): Simpler methods might be faster - Large populations not feasible: CMA-ES needs multiple evaluations per iteration
Comparison with Other Algorithms
Algorithm | Continuous Opt | Discrete Opt | Sample Efficiency | Scalability |
---|---|---|---|---|
CMA-ES | Excellent | Poor | Very High | Good (to ~100D) |
TPE | Good | Excellent | High | Good |
Bayesian Opt | Excellent | Poor | High | Moderate |
Random Search | Good | Good | Low | Excellent |
Advanced Usage
Custom Population Size
Step Size Tuning
Reproducible Results
Mathematical Background
CMA-ES maintains a multivariate normal distribution \(\mathcal{N}(m, \sigma^2 C)\) where:
- \(m\) is the mean (center of the search distribution)
- \(\sigma\) is the step size (global scaling)
- \(C\) is the covariance matrix (shape and orientation)
The algorithm updates these parameters based on the success of sampled points:
- Mean update: Move toward successful solutions
- Step size adaptation: Increase/decrease based on success rate
- Covariance adaptation: Learn problem structure and dependencies
Performance Tips
- Parameter scaling: Ensure all parameters have similar scales (0-1 or similar ranges)
- Step size: Start with sigma0 ≈ 0.25 * (parameter_range)
- Population size: Use default unless you have specific requirements
- Evaluation budget: CMA-ES typically needs 100+ evaluations to be effective
- Continuous parameters: Works best when parameters can be treated as continuous
Common Use Cases
Neural Network Hyperparameters
Regression Model Tuning
References
- Hansen, N., & Ostermeier, A. (2001). Completely derandomized self-adaptation in evolution strategies.
- Hansen, N. (2016). The CMA evolution strategy: A tutorial.
- Optuna CMA-ES documentation: https://optuna.readthedocs.io/