lmflow.models.hf_model_mixin#

Attributes#

Classes#

HFModelMixin

Helper class that provides a standard way to create an ABC using

Module Contents#

lmflow.models.hf_model_mixin.logger[source]#
lmflow.models.hf_model_mixin.HF_AUTOMODEL_MAPPING[source]#
lmflow.models.hf_model_mixin.HF_AUTOMODEL_TYPE[source]#
lmflow.models.hf_model_mixin.LORA_TARGET_MODULES_MAPPING[source]#
class lmflow.models.hf_model_mixin.HFModelMixin(model_args: lmflow.args.ModelArguments, do_train: bool, ds_config=None, device: str | None = 'gpu', use_accelerator: bool = False, hf_auto_model_additional_args: Dict | None = None, *args, **kwargs)[source]#

Bases: lmflow.models.base_model.BaseModel

Helper class that provides a standard way to create an ABC using inheritance.

device[source]#
model_args[source]#
hf_auto_model[source]#
use_accelerator[source]#
ds_config[source]#
do_train[source]#
tokenizer[source]#
torch_dtype[source]#
hf_model_config[source]#
quant_config[source]#
peft_config[source]#
_activated = False[source]#
__prepare_tokenizer(model_args: lmflow.args.ModelArguments) transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast[source]#
__prepare_dtype(model_args: lmflow.args.ModelArguments) torch.dtype[source]#
__prepare_model_config(model_args: lmflow.args.ModelArguments, hf_auto_model_additional_args: Dict | None = None)[source]#

Prepare model configuration for hf auto register, Parameters ———- model_args : ModelArguments

LMFlow model arguments.

hf_auto_model_additional_argsOptional[Dict], optional

Special configurations such as num_labels in AutoModelForSequenceClassification (commonly used in reward modeling) will not preset in __prepare_model_config, so it should be passed in hf_auto_model_additional_args.

Returns#

configModelConfig

hf model config.

__prepare_quant_config(model_args: lmflow.args.ModelArguments)[source]#
__prepare_peft_config(model_args: lmflow.args.ModelArguments)[source]#
__model_module_inject(model_args: lmflow.args.ModelArguments) None[source]#

Override some model modules with custom implementations.

Current implementations: - Position interpolation (model_args.do_rope_scaling):

replace llama embeddings with condense embeddings.

__prepare_model_for_training(model_args: lmflow.args.ModelArguments, hf_auto_model: HF_AUTOMODEL_TYPE)[source]#
__prepare_model_for_inference(model_args: lmflow.args.ModelArguments, hf_auto_model: HF_AUTOMODEL_TYPE, use_accelerator: bool, ds_config)[source]#
__prepare_model_for_vllm_inference(model_args: lmflow.args.ModelArguments, vllm_gpu_memory_utilization: float, vllm_tensor_parallel_size: int)[source]#
__prepare_model_post_process()[source]#
activate_model_for_inference(use_vllm: bool = False, **kwargs)[source]#
deactivate_model_for_inference(use_vllm: bool = False)[source]#

Deactivate the model and release the resources.

NOTE: Currently, VLLM doesn’t have an official way to do this, and the implementation below cannot release all gpu resources by our observation. Thus this method is just a placeholder for future implementation. See: [Github issue](vllm-project/vllm#1908)

get_max_length()[source]#

Return max acceptable input length in terms of tokens.

get_tokenizer()[source]#

Return the tokenizer of the model.

get_backend_model()[source]#

Return the backend model.