lmflow.optim.lars ================= .. py:module:: lmflow.optim.lars Classes ------- .. autoapisummary:: lmflow.optim.lars.LARS Module Contents --------------- .. py:class:: LARS(params, lr: float = 0.01, momentum: float = 0.0, dampening: float = 0.0, weight_decay: float = 0.0, nesterov: bool = False, trust_coefficient: float = 0.01, eps: float = 1e-08) Bases: :py:obj:`torch.optim.optimizer.Optimizer` Extends SGD in PyTorch with LARS scaling from the paper `Large batch training of Convolutional Networks`__. .. note:: The application of momentum in the SGD part is modified according to the PyTorch standards. LARS scaling fits into the equation in the following fashion. .. math:: \begin{aligned} g_{t+1} & = \text{lars_lr} * (\beta * p_{t} + g_{t+1}), \\ v_{t+1} & = \\mu * v_{t} + g_{t+1}, \\ p_{t+1} & = p_{t} - \text{lr} * v_{t+1}, \\end{aligned} where :math:`p`, :math:`g`, :math:`v`, :math:`\\mu` and :math:`\beta` denote the parameters, gradient, velocity, momentum, and weight decay respectively. The :math:`lars_lr` is defined by Eq. 6 in the paper. The Nesterov version is analogously modified. .. warning:: Parameters with weight decay set to 0 will automatically be excluded from layer-wise LR scaling. This is to ensure consistency with papers like SimCLR and BYOL. __ https://arxiv.org/pdf/1708.03888.pdf Note: Reference code: https://github.com/PyTorchLightning/lightning-bolts/ .. !! processed by numpydoc !! .. py:method:: __setstate__(state) -> None .. py:method:: step(closure=None) Performs a single optimization step. Arguments: closure: A closure that reevaluates the model and returns the loss. .. !! processed by numpydoc !!