lmflow.pipeline.vllm_inferencer#
Attributes#
Classes#
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
|
Helper class that provides a standard way to create an ABC using |
Module Contents#
- class lmflow.pipeline.vllm_inferencer.InferencerWithOffloading(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#
Bases:
lmflow.pipeline.base_pipeline.BasePipeline
Helper class that provides a standard way to create an ABC using inheritance.
- class lmflow.pipeline.vllm_inferencer.VLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#
Bases:
InferencerWithOffloading
Helper class that provides a standard way to create an ABC using inheritance.
- parse_to_sampling_params(inference_args: lmflow.args.InferencerArguments) vllm.SamplingParams [source]#
- inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, dataset: lmflow.datasets.Dataset, enable_decode_inference_result: bool = True, release_gpu: bool = False, inference_args: lmflow.args.InferencerArguments | None = None, enable_distributed_inference: bool = False, **kwargs) List[lmflow.utils.data_utils.VLLMInferenceResultWithInput] [source]#
Perform inference using the provided model and dataset. Will save inference results if save_results is set to True in inferencer_args.
- Parameters:
- modelHFDecoderModel
LMFlow HFDecoderModel object
- datasetDataset
LMFlow Dataset object
- apply_chat_templatebool, optional
Whether to apply chat template to the input, by default True.
- enable_decode_inference_resultbool, optional
Whether to decode after generation, by default False.
- release_gpubool, optional
Whether to release gpu resources, by default False.
- inference_argsInferencerArguments, optional
by default None
- Returns:
- List[VLLMInferenceResultWithInput]
Return a list of VLLMInferenceResultWithInput, where each element contains the input prompt and the corresponding output.
When enable_decode_inference_result = True, the output would be a list of strings, contains sampling_params.n samples for the corresponding prompt.
When enable_decode_inference_result = False, return a list of list of ints (token ids, no decoding after generation).
- _inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, model_input: List[str], sampling_params: vllm.SamplingParams, release_gpu: bool = False) List[lmflow.utils.data_utils.VLLMInferenceResultWithInput] [source]#
- _distributed_inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, model_input: ray.data.Dataset, sampling_params: vllm.SamplingParams, num_instances: int, batch_size: int = 4, release_gpu: bool = False) List[lmflow.utils.data_utils.VLLMInferenceResultWithInput] [source]#
- class lmflow.pipeline.vllm_inferencer.MemorySafeVLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#
Bases:
VLLMInferencer
Helper class that provides a standard way to create an ABC using inheritance.
- inference() List[lmflow.utils.data_utils.VLLMInferenceResultWithInput] [source]#
Perform inference using the provided model and dataset. Will save inference results if save_results is set to True in inferencer_args.
- Parameters:
- modelHFDecoderModel
LMFlow HFDecoderModel object
- datasetDataset
LMFlow Dataset object
- apply_chat_templatebool, optional
Whether to apply chat template to the input, by default True.
- enable_decode_inference_resultbool, optional
Whether to decode after generation, by default False.
- release_gpubool, optional
Whether to release gpu resources, by default False.
- inference_argsInferencerArguments, optional
by default None
- Returns:
- List[VLLMInferenceResultWithInput]
Return a list of VLLMInferenceResultWithInput, where each element contains the input prompt and the corresponding output.
When enable_decode_inference_result = True, the output would be a list of strings, contains sampling_params.n samples for the corresponding prompt.
When enable_decode_inference_result = False, return a list of list of ints (token ids, no decoding after generation).