model_params
============

.. py:module:: model_params


Functions
---------

.. autoapisummary::

   model_params.get_decay_parameter_names
   model_params.get_parameter_names_in_param_groups
   model_params.get_parameter_names_require_grads
   model_params.guess_grad_norms_from_pg
   model_params.guess_grad_norms_from_hf_trainer
   model_params.guess_grad_all_zero_from_pg


Module Contents
---------------

.. py:function:: get_decay_parameter_names(model: Union[transformers.PreTrainedModel, torch.nn.Module]) -> List[str]

   
   From transformers.trainer

   Get all parameter names that weight decay will be applied to

   Note that some models implement their own layernorm instead of calling nn.LayerNorm, weight decay could still
   apply to those modules since this function only filter out instance of nn.LayerNorm


   ..
       !! processed by numpydoc !!

.. py:function:: get_parameter_names_in_param_groups(model: Union[transformers.PreTrainedModel, torch.nn.Module], ignore_requires_grad: bool = True) -> List[Dict[str, str]]

.. py:function:: get_parameter_names_require_grads(model: Union[transformers.PreTrainedModel, torch.nn.Module]) -> List[str]

.. py:function:: guess_grad_norms_from_pg(parameter_names: List[Dict[str, str]], all_norms: List[torch.Tensor], show_zero_grads: bool = False, separate_by_layer: bool = False)

.. py:function:: guess_grad_norms_from_hf_trainer(parameter_names: List[str], all_norms: List[torch.Tensor], separate_by_layer: bool = False, note: Optional[str] = None)

.. py:function:: guess_grad_all_zero_from_pg(parameter_names: List[Dict[str, str]], all_grads: List[torch.Tensor], show_zero_grads: bool = False, separate_by_layer: bool = False)