model_params#
Functions#
|
From transformers.trainer |
|
|
|
|
|
|
|
|
|
Module Contents#
- model_params.get_decay_parameter_names(model: transformers.PreTrainedModel | torch.nn.Module) List[str] [source]#
From transformers.trainer
Get all parameter names that weight decay will be applied to
Note that some models implement their own layernorm instead of calling nn.LayerNorm, weight decay could still apply to those modules since this function only filter out instance of nn.LayerNorm
- model_params.get_parameter_names_in_param_groups(model: transformers.PreTrainedModel | torch.nn.Module, ignore_requires_grad: bool = True) List[Dict[str, str]] [source]#
- model_params.get_parameter_names_require_grads(model: transformers.PreTrainedModel | torch.nn.Module) List[str] [source]#
- model_params.guess_grad_norms_from_pg(parameter_names: List[Dict[str, str]], all_norms: List[torch.Tensor], show_zero_grads: bool = False, separate_by_layer: bool = False)[source]#