adagrad

Adaptive gradient descent

Unlike other optimizers, learning rate adapts to the data, it’s well suited for sparse data.

Classes

AdaGrad(lr, epsilon)

Adaptive gradient descent.