{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Cross validation\n\nIn all previous examples we showed the training performance of the models. However,\nin practice it is much more interesting how well a model performs on\nunseen data, i.e., the generalizability of the model. We can use `crossvalidation`_ for this.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import numpy as np\nfrom sklearn.datasets import load_iris\nfrom sklearn.model_selection import (\n    cross_val_score,\n    RepeatedKFold,\n)\nfrom sklearn.pipeline import make_pipeline\nfrom sklearn.preprocessing import StandardScaler\n\nfrom sklvq import GMLVQ\n\ndata, labels = load_iris(return_X_y=True)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Sklearn provides a very handy way of performing cross validation. For this\npurpose we firstly create a pipeline and initiate a sklearn object that will\nrepeatedly create k folds for us.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Create a scaler instance\nscaler = StandardScaler()\n\n# Create a GMLVQ  model instance\nmodel = GMLVQ(\n    distance_type=\"adaptive-squared-euclidean\",\n    activation_type=\"swish\",\n    activation_params={\"beta\": 2},\n    solver_type=\"waypoint-gradient-descent\",\n    solver_params={\"max_runs\": 10, \"k\": 3, \"step_size\": np.array([0.1, 0.05])},\n    random_state=1428,\n)\n\n# Link them together (Note this will work as it should in a CV setting, i.e., it's fitted to the\n# training data and predict is used for the testing data which makes sure the test data is\n# scaled using the tranformation parameters found during training.\npipeline = make_pipeline(scaler, model)\n\n# Create an object that n_repeat times creates k=10  folds.\nrepeated_10_fold = RepeatedKFold(n_splits=10, n_repeats=10)\n\n# Call the cross_val_score using all created instances and loaded data. Note it can accept\n# different and also multiple scoring parameters\naccuracy = cross_val_score(\n    pipeline, data, labels, cv=repeated_10_fold, scoring=\"accuracy\"\n)\n\n# Print the mean and standard deviation of the cross validation testing scores.\nprint(\n    \"Accuracy, mean (std): {:.2f} ({:.2f})\".format(np.mean(accuracy), np.std(accuracy))\n)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}