lmflow.models.hf_model_mixin#

Attributes#

`logger`
`HF_AUTOMODEL_MAPPING`
`HF_AUTOMODEL_TYPE`
`LORA_TARGET_MODULES_MAPPING`

Classes#

HFModelMixin

Module Contents#

lmflow.models.hf_model_mixin.logger[source]#

lmflow.models.hf_model_mixin.HF_AUTOMODEL_MAPPING[source]#

lmflow.models.hf_model_mixin.HF_AUTOMODEL_TYPE[source]#

lmflow.models.hf_model_mixin.LORA_TARGET_MODULES_MAPPING[source]#

class lmflow.models.hf_model_mixin.HFModelMixin(model_args: lmflow.args.ModelArguments, do_train: bool, device: str | None = 'gpu', hf_auto_model_additional_args: dict | None = None, *args, **kwargs)[source]#

Bases: lmflow.models.base_model.BaseModel

device = 'gpu'[source]#

model_args[source]#

hf_auto_model[source]#

do_train[source]#

tokenizer[source]#

torch_dtype[source]#

hf_model_config[source]#

quant_config = None[source]#

peft_config = None[source]#

_activated = False[source]#

__prepare_tokenizer(model_args: lmflow.args.ModelArguments) → transformers.PreTrainedTokenizer | transformers.PreTrainedTokenizerFast[source]#

__prepare_dtype(model_args: lmflow.args.ModelArguments) → torch.dtype[source]#

__prepare_model_config(model_args: lmflow.args.ModelArguments, hf_auto_model_additional_args: dict | None = None)[source]#

Prepare model configuration for hf auto register, Parameters ———- model_args : ModelArguments

LMFlow model arguments.

hf_auto_model_additional_argsOptional[dict], optional: Special configurations such as num_labels in AutoModelForSequenceClassification (commonly used in reward modeling) will not preset in __prepare_model_config, so it should be passed in hf_auto_model_additional_args.

Returns#

configModelConfig: hf model config.

__prepare_quant_config(model_args: lmflow.args.ModelArguments)[source]#

__prepare_peft_config(model_args: lmflow.args.ModelArguments)[source]#

__model_module_inject(model_args: lmflow.args.ModelArguments) → None[source]#

Override some model modules with custom implementations.

Current implementations: - Position interpolation (model_args.do_rope_scaling):

replace llama embeddings with condense embeddings.

__prepare_model_for_training(model_args: lmflow.args.ModelArguments, hf_auto_model: HF_AUTOMODEL_TYPE)[source]#

__prepare_model_for_inference(model_args: lmflow.args.ModelArguments, hf_auto_model: HF_AUTOMODEL_TYPE)[source]#

__prepare_model_for_vllm_inference(model_args: lmflow.args.ModelArguments, vllm_gpu_memory_utilization: float, vllm_tensor_parallel_size: int)[source]#

__fix_special_tokens()[source]#

activate_model_for_inference(use_vllm: bool = False, **kwargs)[source]#

deactivate_model_for_inference(use_vllm: bool = False)[source]#

Deactivate the model and release the resources.

NOTE: Currently, VLLM doesn’t have an official way to do this, and the implementation below cannot release all gpu resources by our observation. Thus this method is just a placeholder for future implementation. See: [Github issue](vllm-project/vllm#1908)

get_max_length()[source]#: Return max acceptable input length in terms of tokens.

get_tokenizer()[source]#: Return the tokenizer of the model.

get_backend_model()[source]#: Return the backend model.

lmflow.models.hf_model_mixin#

Attributes#

Classes#

Module Contents#

Returns#

This Page