{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%matplotlib inline"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Not a Number LVQ (NaNLVQ)\n\nNanLVQ `[1]`_ refers to a extension that can be implemented for various distance functions. It uses\nthe partial distance strategy to ignore any NaN values in the data. Another interpretation would be\nthat it imputes the missing values with those of the prototypes. Hence, the distance will\nbe zero, which results in a zero update for the feature containing the NaN value.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import matplotlib\nimport numpy as np\nfrom sklearn.datasets import load_iris\nfrom sklearn.metrics import classification_report\nfrom sklearn.preprocessing import StandardScaler\n\nfrom sklvq import GMLVQ\n\nmatplotlib.rc(\"xtick\", labelsize=\"small\")\nmatplotlib.rc(\"ytick\", labelsize=\"small\")\n\niris = load_iris()\n\ndata = iris.data\nlabels = iris.target\n\n# Insert some \"random\" missing values represented by np.nan\nnum_missing_values = 50\nnum_samples, num_dimensions = data.shape\n\ni = np.random.choice(num_samples, num_missing_values, replace=False)\nj = np.random.choice(num_dimensions, num_missing_values, replace=True)\n\ndata[i, j] = np.nan"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Fitting the Model\nScale the data and create a GMLVQ object with, e.g., custom distance function, activation\nfunction and solver. See the API reference under documentation for defaults and other\npossible parameters.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Object to perform z-transform\nscaler = StandardScaler()\n\n# Compute (fit) and apply (transform) z-transform\ndata = scaler.fit_transform(data)\n\n# The creation of the model object used to fit the data to.\nmodel = GMLVQ(\n    distance_type=\"adaptive-squared-euclidean\",\n    activation_type=\"swish\",\n    activation_params={\"beta\": 2},\n    solver_type=\"waypoint-gradient-descent\",\n    solver_params={\"max_runs\": 10, \"k\": 3, \"step_size\": np.array([0.1, 0.05])},\n    random_state=1428,\n    force_all_finite=\"allow-nan\",  # This will make the data  validation  and distance function\n    # accept and deal with np.nan values.\n)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The next step is to fit the GMLVQ object to the data and use the predict method to make the\npredictions. Note that this example only works on the training data and therefor does not say\nanything about the generalizability of the fitted model.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Train the model using the data and labels\nmodel.fit(data, labels)\n\n# Predict the labels using the trained model\npredicted_labels = model.predict(data)\n\n# To get a sense of the training performance we could print the classification report.\nprint(classification_report(labels, predicted_labels))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The examples uses GMLVQ but all models and their compatible distance functions support the\n`force_all_finite` option.\n\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## References\n_`[1]` Rick van Veen (2016). Analysis of Missing Data Imputation Applied to Heart Failure Data (\nMaster's Thesis, University  of Groningen, Groningen, The Netherlands). Retrieved from\nhttp://fse.studenttheses.ub.rug.nl/id/eprint/14679\n\n"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.9"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}