OptCV Interface
Introduction
OptCV provides a familiar sklearn-like interface for hyperparameter search while enabling the use of any Hyperactive optimizer. This bridge lets sklearn users leverage advanced optimization algorithms without changing their workflow significantly.
Design Philosophy
OptCV aims for close compatibility with sklearn's search interfaces while providing:
- Familiar API: Similar methods and key attributes to GridSearchCV
- Advanced optimizers: Access to 25+ optimization algorithms
- Seamless integration: Works with existing sklearn pipelines and workflows
- Enhanced performance: Often better results than grid/random search
Class Signature
Parameters
estimator
- Type: sklearn estimator
- Description: The machine learning model to optimize
- Examples: Any sklearn classifier, regressor, or pipeline
optimizer
- Type: Hyperactive optimizer instance
- Description: Configured Hyperactive optimizer with experiment
- Note: Must be initialized with a compatible experiment
cv
- Type:
int
or sklearn CV object - Default:
5
- Description: Cross-validation strategy
scoring
- Type:
str
, callable, orNone
- Default:
None
(uses estimator's default scorer) - Description: Scoring function for evaluation
refit
- Type:
bool
- Default:
True
- Description: Whether to refit the best estimator on full dataset
Basic Usage
Simple Classification Example
Regression Example
Advanced Usage
Different Optimizers Comparison
Pipeline Integration
Custom Cross-Validation
Integration with Existing Workflows
Drop-in Replacement for GridSearchCV
With sklearn Model Selection
Attributes
After fitting, OptCV provides the following attributes:
best_params_
best_score_
best_estimator_
Note: cv_results_
is not provided by OptCV in v5.
Methods (Same as GridSearchCV)
fit(X, y)
predict(X)
predict_proba(X)
score(X, y)
decision_function(X)
Performance Considerations
Memory Usage
Computational Efficiency
Common Patterns
Ensemble of Optimized Models
Hyperparameter Analysis
Best Practices
- Optimizer Selection: Choose optimizers based on your parameter space characteristics
- Refit Setting: Set
refit=True
for production models - Cross-Validation: Use appropriate CV strategy for your data type
- Scoring Metrics: Select metrics that align with business objectives
- Memory Management: Monitor memory usage with large datasets
- Reproducibility: Set random seeds in both estimators and optimizers
Migration from sklearn
From GridSearchCV
From RandomizedSearchCV
References
- Scikit-learn GridSearchCV: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
- Scikit-learn Model Selection: https://scikit-learn.org/stable/model_selection.html