Skip to content

Scikit-learn CV Experiment

Introduction

The SklearnCvExperiment class provides seamless integration between Hyperactive optimizers and scikit-learn estimators with cross-validation. This experiment type handles the complexity of hyperparameter optimization with proper cross-validation, making it the go-to choice for most machine learning optimization tasks.

Core Functionality

SklearnCvExperiment automatically: - Manages cross-validation splits - Handles parameter validation and conversion - Supports all sklearn estimators and pipelines - Provides comprehensive scoring options

Class Signature

Parameters

estimator

  • Type: sklearn estimator or pipeline
  • Description: The model to optimize

X, y

  • Type: array-like
  • Description: Training features and target values

cv

  • Type: int or sklearn CV object, default KFold(n_splits=3, shuffle=True)
  • Description: Cross-validation strategy

scoring

  • Type: str or callable, default depends on estimator
  • Description: Scoring function for evaluation

Basic Usage Examples

Classification

Regression

Advanced Usage

Custom Cross-Validation Strategies

Custom Scoring Functions

Pipeline Optimization

Early Stopping and Budget Management

Common Patterns

Multi-Algorithm Comparison

Nested Cross-Validation

Best Practices

  1. Data Splitting: Always use proper train/validation/test splits
  2. CV Strategy: Choose appropriate cross-validation for your data type
  3. Scoring Metrics: Select metrics that align with your business objective
  4. Parameter Ranges: Define reasonable parameter ranges based on domain knowledge
  5. Computational Resources: Use optimizer backend/backend_params for parallel processing
  6. Reproducibility: Set random seeds in estimators and CV splitters
  7. Evaluation: Use nested CV for unbiased performance estimation

Common Issues and Solutions

Memory Issues

Imbalanced Datasets

Time Series Data

Integration with Other Hyperactive Components

With OptCV Interface

With Different Optimizers

References