Unlike other optimizers, learning rate adapts to the data, it’s well suited for sparse data.
Classes
AdaGrad(lr, epsilon)
AdaGrad
Adaptive gradient descent.