lmflow.pipeline.vllm_inferencer#

Attributes#

Classes#

VLLMInferencer

MemorySafeVLLMInferencer

Run VLLM inference in a subprocess for memory safety.

Module Contents#

lmflow.pipeline.vllm_inferencer.logger[source]#
class lmflow.pipeline.vllm_inferencer.VLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#

Bases: lmflow.pipeline.base_pipeline.BasePipeline

model_args[source]#
data_args[source]#
inferencer_args[source]#
eos_token_id[source]#
sampling_params[source]#
_parse_args_to_sampling_params(inference_args: lmflow.args.InferencerArguments) dict[source]#
inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, dataset: lmflow.datasets.Dataset, release_gpu: bool = False, inference_args: lmflow.args.InferencerArguments | None = None) lmflow.utils.protocol.DataProto[source]#
save_inference_results(outputs: lmflow.utils.protocol.DataProto, inference_results_path: str)[source]#
load_inference_results(inference_results_path: str) lmflow.utils.protocol.DataProto[source]#
class lmflow.pipeline.vllm_inferencer.MemorySafeVLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#

Bases: VLLMInferencer

Run VLLM inference in a subprocess for memory safety.

Deprecated since version Scheduled: for removal in lmflow 1.1.0. Use VLLMInferencer with release_gpu=True for the common single-GPU case, or wait for the sleep-mode-based replacement that will land alongside the vllm>=0.11 pin. This subprocess wrapper was a workaround for vllm’s inability to release GPU memory in-process (vllm-project/vllm#1908); the in-process path is now reliable for most use cases.

inferencer_file_path[source]#
inference() lmflow.utils.protocol.DataProto[source]#