sklvq.solvers.WaypointGradientDescent

class sklvq.solvers.WaypointGradientDescent(objective: sklvq.objectives._base.ObjectiveBaseClass, max_runs: int = 10, step_size: Union[float, numpy.ndarray] = 0.1, loss: float = 0.6666666666666666, gain: float = 1.1, k: int = 3, callback: Optional[callable] = None)[source]

Waypoint gradient descent (WGD)

Implements the waypoint average optimization algorithm [1]. Implementation and description is inspired by [2].

The algorithm keeps a rolling average of the last k model parameters. After k steps the algorithms will compare the cost of the average model parameters (\hat{
\mathbf{\theta}}) versus a “regular” update of the model parameters (\tilde{\mathbf{
\theta}}).

\tilde{\mathbf{\theta}} &= \theta_t - \eta \cdot \frac{\nabla E(\theta_t)}{||\nabla E(\theta_t)||}  \\
\hat{\mathbf{\theta}} &= \frac{1}{k} \sum_{i=0}^{k - 1} \mathbf{\theta}_{t_i}

If the regular step results in a lower cost (E(\tilde{\mathbf{\theta}}) < E(\hat{
\mathbf{\theta}})), the step_size is increased by multiplying with the gain factor:

\mathbf{\theta}_{t+1} &= \tilde{\mathbf{\theta}}  \\
\eta &= gain \cdot \eta.

If the average step results in a lower cost ((E(\hat{\mathbf{\theta}}) < E(\tilde{
\mathbf{\theta}})) the step_size is decreased by multiplying with the loss factor:

\mathbf{\theta}_{t+1} &= \hat{\mathbf{\theta}} \\
\eta &= loss \cdot \eta.

Note that the solver uses the normalized objective gradient to update the model.

Parameters
objective: ObjectiveBaseClass, required
This is set by the algorithm. See sklvq.models.GLVQ, sklvq.models.GMLVQ,

and sklvq.models.LGMLVQ.

max_runs: int

Maximum number of runs/epochs that will be computed. Should be >= k. Early stopping can be implemented by providing a callback function that returns True when the solver should stop.

step_size: float or ndarray

The step size to control the learning rate of the model parameters. If the same step size should be used for all parameters (e.g., prototypes and omega) then a single float is sufficient. If separate initial step_sizes should be used per model parameter then this should be specified by using a ndarray.

Whenever the averge update is accepted (has a lower cost) the step sizes are multiplied with the loss factor. When the “regular” update is accepted then the step size(s) are multiplied by the gain factor.

loss: float

Should be a value less than 1. Controls the step size change factor when an average waypoint step is accepted.

gain: float

Should be a value greater than 1. Controls the step size change factor when a regular update step is accepted.

k: int

The number of runs used to compute the average waypoint over.

callback: callable

Callable with signature callable(state). If the callable returns True the solver will stop even if max_runs is not reached yet. The state object contains the following:

  • “variables”

    Concatenated 1D ndarray of the model’s parameters

  • “nit”

    The current iteration counter.

  • “fun”

    The accepted cost.

  • “nfun”

    The cost of the regular update step.

  • “tfun”

    The cost of the “tentative” update, i.e., the average of the past k updates.

  • “step_size”

    The current step_size(s)

References

[1] Papari, G., and Bunte, K., and Biehl, M. (2011) “Waypoint averaging and step size control in learning by gradient descent” Mittweida Workshop on Computational Intelligence (MIWOCI) 2011.

[2] LeKander, M., Biehl, M., & De Vries, H. (2017). “Empirical evaluation of gradient methods for matrix learning vector quantization.” 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization, WSOM 2017.

solve(data: numpy.ndarray, labels: numpy.ndarray, model: LVQBaseClass)[source]

Solve function that gets called by the fit method of the models.

Performs the steps of the waypoint gradient descent optimization method.

Parameters
datandarray of shape (n_samples, n_features)

The data.

labelsndarray of size (n_samples)

The labels of the samples in the data.

modelLVQBaseClass

The initial model that will also hold the final result