sklvq.solvers.AdaptiveMomentEstimation

class sklvq.solvers.AdaptiveMomentEstimation(objective: sklvq.objectives._base.ObjectiveBaseClass, max_runs: int = 10, beta1: float = 0.9, beta2: float = 0.999, step_size: float = 0.001, epsilon: float = 0.0001, callback: Optional[callable] = None)[source]

Adaptive moment estimation (ADAM)

Implementation and description inspired by [1].

Adam maintains two moving averages of the gradient (m, v), which get updated for every sample at each epoch/run until the maximum runs (max_runs) has been reached:

\mathbf{m} &= \beta_1 \cdot \mathbf{m} + (1 - \beta_1) \cdot \nabla e_i(\theta) \\
\mathbf{v} &= \beta_2 \cdot \mathbf{v} + (1 - \beta_2) \cdot [\nabla e_i(\theta)]^{\circ 2}.

Since m and v are initialized to zero vectors, they are biased towards zero. To counteract this, unbiased estimates \hat{m} and \hat{v} are computed:

\hat{\mathbf{m}} &= \mathbf{m} / (1 - \beta^p_1) \\
\hat{\mathbf{v}} &= \mathbf{v} / (1 - \beta^p_2),

where p is initially 0, but afterwards it’s increased by 1 each time before selecting a new random sample. The unbiased estimates of the average gradient are then used for the update step:

\theta = \theta - \eta \cdot \hat{\mathbf{m}} \odot \hat{\mathbf{v}}^{\circ \frac{1}{2}},

with \eta the step_size. Additionally, beta1, and beta2, can be chosen by the user.

Note that \odot denotes the elementwise (Hadamard) product and \mathbf{x}^{
\circ y} the elementwise power operation.

Parameters
objective: ObjectiveBaseClass, required

This is/should be set by the algorithm.

max_runs: int

Number of runs over all the X. Should be >= 1

beta1: float

Controls the decay rate of the moving average of the gradient. Should be < 1.0 and > 0.

beta2: float

Controls the decay rate of the moving average of the squared gradient. Should be < 1.0 and > 0.

step_size: float

The step size to control the learning rate.

epsilon: float

Small value to overcome zero division

callback: callable

Callable with signature callable(state). If the callable returns True the solver will stop (early). The state object contains the following information:

  • “variables”

    Concatenated 1D ndarray of the model’s parameters

  • “nit”

    The current iteration counter

  • “fun”

    The objective cost

  • “m_hat”

    Unbiased moving average of the gradient

  • “v_hat”

    Unbiased moving average of the Hadamard squared gradient

References

[1] LeKander, M., Biehl, M., & De Vries, H. (2017). “Empirical evaluation of gradient methods for matrix learning vector quantization.” 12th International Workshop on Self-Organizing Maps and Learning Vector Quantization, Clustering and Data Visualization, WSOM 2017.

solve(data: numpy.ndarray, labels: numpy.ndarray, model: LVQBaseClass)[source]
Parameters
datandarray of shape (number of observations, number of dimensions)
labelsndarray of size (number of observations)
modelLVQBaseClass

The initial model that will be changed and holds the results at the end