lmflow.pipeline.vllm_inferencer
===============================

.. py:module:: lmflow.pipeline.vllm_inferencer


Attributes
----------

.. autoapisummary::

   lmflow.pipeline.vllm_inferencer.logger


Classes
-------

.. autoapisummary::

   lmflow.pipeline.vllm_inferencer.InferencerWithOffloading
   lmflow.pipeline.vllm_inferencer.VLLMInferencer
   lmflow.pipeline.vllm_inferencer.MemorySafeVLLMInferencer


Module Contents
---------------

.. py:data:: logger

.. py:class:: InferencerWithOffloading(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)

   Bases: :py:obj:`lmflow.pipeline.base_pipeline.BasePipeline`


   .. py:attribute:: model_args


   .. py:attribute:: data_args


   .. py:attribute:: inferencer_args


   .. py:attribute:: eos_token_id


   .. py:method:: inference()
      :abstractmethod:


   .. py:method:: save_inference_results()
      :abstractmethod:


   .. py:method:: load_inference_results()
      :abstractmethod:


.. py:class:: VLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)

   Bases: :py:obj:`InferencerWithOffloading`


   .. py:attribute:: sampling_params


   .. py:method:: parse_to_sampling_params(inference_args: lmflow.args.InferencerArguments) -> vllm.SamplingParams


   .. py:method:: inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, dataset: lmflow.datasets.Dataset, enable_decode_inference_result: bool = True, release_gpu: bool = False, inference_args: Optional[lmflow.args.InferencerArguments] = None, enable_distributed_inference: bool = False, **kwargs) -> list[lmflow.utils.data_utils.VLLMInferenceResultWithInput]

      
      Perform inference using the provided model and dataset. Will save inference results if
      `save_results` is set to True in `inferencer_args`.


      :Parameters:

          **model** : HFDecoderModel
              LMFlow HFDecoderModel object

          **dataset** : Dataset
              LMFlow Dataset object

          **apply_chat_template** : bool, optional
              Whether to apply chat template to the input, by default True.

          **enable_decode_inference_result** : bool, optional
              Whether to decode after generation, by default False.

          **release_gpu** : bool, optional
              Whether to release gpu resources, by default False.

          **inference_args** : InferencerArguments, optional
              by default None


      :Returns:

          list[VLLMInferenceResultWithInput]
              Return a list of VLLMInferenceResultWithInput, where each
              element contains the input prompt and the corresponding output.
              
              When `enable_decode_inference_result = True`, the output would be a list of strings,
              contains sampling_params.n samples for the corresponding prompt.
              
              When `enable_decode_inference_result = False`, return a list of list of ints
              (token ids, no decoding after generation).


      ..
          !! processed by numpydoc !!


   .. py:method:: _inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, model_input: list[str], sampling_params: vllm.SamplingParams, release_gpu: bool = False) -> list[lmflow.utils.data_utils.VLLMInferenceResultWithInput]


   .. py:method:: _distributed_inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, model_input: ray.data.Dataset, sampling_params: vllm.SamplingParams, num_instances: int, batch_size: int = 4, release_gpu: bool = False) -> list[lmflow.utils.data_utils.VLLMInferenceResultWithInput]


   .. py:method:: save_inference_results(outputs: Union[list[list[str]], list[list[list[int]]]], save_file_path: str)


   .. py:method:: load_inference_results(results_path: str) -> Union[list[list[str]], list[list[list[int]]]]


.. py:class:: MemorySafeVLLMInferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)

   Bases: :py:obj:`VLLMInferencer`


   .. py:attribute:: inferencer_file_path


   .. py:method:: inference() -> list[lmflow.utils.data_utils.VLLMInferenceResultWithInput]

      
      Perform inference using the provided model and dataset. Will save inference results if
      `save_results` is set to True in `inferencer_args`.


      :Parameters:

          **model** : HFDecoderModel
              LMFlow HFDecoderModel object

          **dataset** : Dataset
              LMFlow Dataset object

          **apply_chat_template** : bool, optional
              Whether to apply chat template to the input, by default True.

          **enable_decode_inference_result** : bool, optional
              Whether to decode after generation, by default False.

          **release_gpu** : bool, optional
              Whether to release gpu resources, by default False.

          **inference_args** : InferencerArguments, optional
              by default None


      :Returns:

          list[VLLMInferenceResultWithInput]
              Return a list of VLLMInferenceResultWithInput, where each
              element contains the input prompt and the corresponding output.
              
              When `enable_decode_inference_result = True`, the output would be a list of strings,
              contains sampling_params.n samples for the corresponding prompt.
              
              When `enable_decode_inference_result = False`, return a list of list of ints
              (token ids, no decoding after generation).


      ..
          !! processed by numpydoc !!