model_params#

Functions#

get_decay_parameter_names(→ List[str])

From transformers.trainer

get_parameter_names_in_param_groups(→ List[Dict[str, str]])

get_parameter_names_require_grads(→ List[str])

guess_grad_norms_from_pg(parameter_names, all_norms[, ...])

guess_grad_norms_from_hf_trainer(parameter_names, ...)

guess_grad_all_zero_from_pg(parameter_names, all_grads)

Module Contents#

model_params.get_decay_parameter_names(model: transformers.PreTrainedModel | torch.nn.Module) List[str][source]#

From transformers.trainer

Get all parameter names that weight decay will be applied to

Note that some models implement their own layernorm instead of calling nn.LayerNorm, weight decay could still apply to those modules since this function only filter out instance of nn.LayerNorm

model_params.get_parameter_names_in_param_groups(model: transformers.PreTrainedModel | torch.nn.Module, ignore_requires_grad: bool = True) List[Dict[str, str]][source]#
model_params.get_parameter_names_require_grads(model: transformers.PreTrainedModel | torch.nn.Module) List[str][source]#
model_params.guess_grad_norms_from_pg(parameter_names: List[Dict[str, str]], all_norms: List[torch.Tensor], show_zero_grads: bool = False, separate_by_layer: bool = False)[source]#
model_params.guess_grad_norms_from_hf_trainer(parameter_names: List[str], all_norms: List[torch.Tensor], separate_by_layer: bool = False, note: str | None = None)[source]#
model_params.guess_grad_all_zero_from_pg(parameter_names: List[Dict[str, str]], all_grads: List[torch.Tensor], show_zero_grads: bool = False, separate_by_layer: bool = False)[source]#