sklvq
stable

Getting Started

  • Install and Contribute

Documentation

  • API Reference

Tutorial - Examples

  • Basic Usage
  • Pre-processing
  • Model Selection
    • Cross validation
    • Grid Search
  • Performance Metrics
  • Customization
sklvq
  • »
  • Basic Usage »
  • Grid Search
  • Edit on GitHub

Note

Click here to download the full example code

Grid Search¶

Cross validation is not the whole story as it only can tell you the expected performance of a single set of (hyper) parameters. Luckily sklearn also provides a way of trying out multiple settings and return the CV scores for each of them. We can use gridsearch for this.

from sklearn.datasets import load_iris
from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler

from sklvq import GMLVQ

data, labels = load_iris(return_X_y=True)

We first need to create a pipeline and initialize a parameter grid we want to search.

# Create the standard scaler instance
standard_scaler = StandardScaler()

# Create the GMLVQ model instance
model = GMLVQ()

# Link them together by using sklearn's pipeline
pipeline = make_pipeline(standard_scaler, model)

# We want to see the difference in performance of the two following solvers
solvers_types = [
    "steepest-gradient-descent",
    "waypoint-gradient-descent",
]

# Currently, the  sklvq package contains only the following distance function compatible with
# GMLVQ. However, see the customization examples for writing your own.
distance_types = ["adaptive-squared-euclidean"]

# Because we are using a pipeline we need to prepend the parameters with the name of the
# class of instance we want to provide the parameters for.
param_grid = [
    {
        "gmlvq__solver_type": solvers_types,
        "gmlvq__distance_type": distance_types,
        "gmlvq__activation_type": ["identity"],
    },
    {
        "gmlvq__solver_type": solvers_types,
        "gmlvq__distance_type": distance_types,
        "gmlvq__activation_type": ["sigmoid"],
        "gmlvq__activation_params": [{"beta": beta} for beta in range(1, 4, 1)],
    },
]
# This grid can be read as: for each solver, try each distance type with the identity function,
# and the sigmoid activation function for each beta in the range(1, 4, 1)

# Initialize a repeated stratiefiedKFold object
repeated_kfolds = RepeatedStratifiedKFold(n_splits=5, n_repeats=5)

# Initilialize the gridsearch CV instance that will fit the pipeline (standard scaler, GMLVQ) to
# the data for each of the parameter sets in the grid. Where each fit is a 5 times
# repeated stratified 5 fold cross validation. For each set return the testing accuracy.
search = GridSearchCV(
    pipeline,
    param_grid,
    scoring="accuracy",
    cv=repeated_kfolds,
    n_jobs=4,
    return_train_score=False,
    verbose=10,
)

# The gridsearch object can be fitted to the data.
search.fit(data, labels)

# Print the best CV score and parameters.
print("\nBest parameter (CV score=%0.3f):" % search.best_score_)
print(search.best_params_)

Out:

Fitting 25 folds for each of 8 candidates, totalling 200 fits

Best parameter (CV score=0.972):
{'gmlvq__activation_params': {'beta': 1}, 'gmlvq__activation_type': 'sigmoid', 'gmlvq__distance_type': 'adaptive-squared-euclidean', 'gmlvq__solver_type': 'waypoint-gradient-descent'}

When inspecting the resulting classifier and its prototypes, e.g., in a plot overlaid on a scatter plot of the data, don’t forget to apply the scaling to the data:

transformed_data = search.transform(data)

Total running time of the script: ( 0 minutes 30.358 seconds)

Download Python source code: plot_02_gridsearch_lvq.py

Download Jupyter notebook: plot_02_gridsearch_lvq.ipynb

Gallery generated by Sphinx-Gallery

Next Previous

© Copyright 2020, Rick van Veen. Revision 9b64b15e.

Built with Sphinx using a theme provided by Read the Docs.