lmflow.models.hf_model_mixin#
Attributes#
Classes#
Helper class that provides a standard way to create an ABC using |
Module Contents#
- class lmflow.models.hf_model_mixin.HFModelMixin(model_args: lmflow.args.ModelArguments, do_train: bool, ds_config=None, device: str | None = 'gpu', use_accelerator: bool = False, hf_auto_model_additional_args: Dict | None = None, *args, **kwargs)[source]#
Bases:
lmflow.models.base_model.BaseModel
Helper class that provides a standard way to create an ABC using inheritance.
- __prepare_tokenizer(model_args: lmflow.args.ModelArguments) transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast [source]#
- __prepare_dtype(model_args: lmflow.args.ModelArguments) torch.dtype [source]#
- __prepare_model_config(model_args: lmflow.args.ModelArguments, hf_auto_model_additional_args: Dict | None = None)[source]#
Prepare model configuration for hf auto register, Parameters ———- model_args : ModelArguments
LMFlow model arguments.
- hf_auto_model_additional_argsOptional[Dict], optional
Special configurations such as num_labels in AutoModelForSequenceClassification (commonly used in reward modeling) will not preset in __prepare_model_config, so it should be passed in hf_auto_model_additional_args.
Returns#
- configModelConfig
hf model config.
- __prepare_quant_config(model_args: lmflow.args.ModelArguments)[source]#
- __prepare_peft_config(model_args: lmflow.args.ModelArguments)[source]#
- __model_module_inject(model_args: lmflow.args.ModelArguments) None [source]#
Override some model modules with custom implementations.
Current implementations: - Position interpolation (model_args.do_rope_scaling):
replace llama embeddings with condense embeddings.
- __prepare_model_for_training(model_args: lmflow.args.ModelArguments, hf_auto_model: HF_AUTOMODEL_TYPE)[source]#
- __prepare_model_for_inference(model_args: lmflow.args.ModelArguments, hf_auto_model: HF_AUTOMODEL_TYPE, use_accelerator: bool, ds_config)[source]#
- __prepare_model_for_vllm_inference(model_args: lmflow.args.ModelArguments, vllm_gpu_memory_utilization: float, vllm_tensor_parallel_size: int)[source]#
- deactivate_model_for_inference(use_vllm: bool = False)[source]#
Deactivate the model and release the resources.
NOTE: Currently, VLLM doesn’t have an official way to do this, and the implementation below cannot release all gpu resources by our observation. Thus this method is just a placeholder for future implementation. See: [Github issue](vllm-project/vllm#1908)