lmflow.pipeline.vllm_inferencer#
Attributes#
Classes#
Run VLLM inference in a subprocess for memory safety. |
Module Contents#
- class lmflow.pipeline.vllm_inferencer.VLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#
Bases:
lmflow.pipeline.base_pipeline.BasePipeline- _parse_args_to_sampling_params(inference_args: lmflow.args.InferencerArguments) dict[source]#
- inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, dataset: lmflow.datasets.Dataset, release_gpu: bool = False, inference_args: lmflow.args.InferencerArguments | None = None) lmflow.utils.protocol.DataProto[source]#
- save_inference_results(outputs: lmflow.utils.protocol.DataProto, inference_results_path: str)[source]#
- load_inference_results(inference_results_path: str) lmflow.utils.protocol.DataProto[source]#
- class lmflow.pipeline.vllm_inferencer.MemorySafeVLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#
Bases:
VLLMInferencerRun VLLM inference in a subprocess for memory safety.
This is a workaround since vllm cannot release GPU memory properly in-process. See: https://github.com/vllm-project/vllm/issues/1908
- inference() lmflow.utils.protocol.DataProto[source]#