We've released our memory-efficient finetuning algorithm LISA, check out [Paper][User Guide] for more details!

lmflow.pipeline.inferencer#

The Inferencer class simplifies the process of model inferencing.

Module Contents#

Classes#

Inferencer

Initializes the Inferencer class with given arguments.

SpeculativeInferencer

Ref: [arXiv:2211.17192v2](https://arxiv.org/abs/2211.17192)

ToolInferencer

Initializes the ToolInferencer class with given arguments.

Functions#

rstrip_partial_utf8(string)

Attributes#

supported_dataset_type

logger

lmflow.pipeline.inferencer.rstrip_partial_utf8(string)[source]#
lmflow.pipeline.inferencer.supported_dataset_type = ['text_only', 'image_text'][source]#
lmflow.pipeline.inferencer.logger[source]#
class lmflow.pipeline.inferencer.Inferencer(model_args, data_args, inferencer_args)[source]#

Bases: lmflow.pipeline.base_pipeline.BasePipeline

Initializes the Inferencer class with given arguments.

Parameters:
model_argsModelArguments object.

Contains the arguments required to load the model.

data_argsDatasetArguments object.

Contains the arguments required to load the dataset.

inferencer_argsInferencerArguments object.

Contains the arguments required to perform inference.

create_dataloader(dataset: lmflow.datasets.dataset.Dataset)[source]#

Batchlize dataset and format it to dataloader.

Args:

dataset (Dataset): the dataset object

Output:

dataloader (batchlize): the dataloader object dataset_size (int): the length of the dataset

inference(model, dataset: lmflow.datasets.dataset.Dataset, max_new_tokens: int = 100, temperature: float = 0.0, prompt_structure: str = '{input}', remove_image_flag: bool = False, chatbot_type: str = 'mini_gpt')[source]#

Perform inference for a model

Parameters:
modelTunableModel object.

TunableModel to perform inference

datasetDataset object.
Returns:
output_dataset: Dataset object.
stream_inference(context, model, max_new_tokens, token_per_step, temperature, end_string, input_dataset, remove_image_flag: bool = False)[source]#
class lmflow.pipeline.inferencer.SpeculativeInferencer(model_args, draft_model_args, data_args, inferencer_args)[source]#

Bases: Inferencer

Ref: [arXiv:2211.17192v2](https://arxiv.org/abs/2211.17192)

Parameters:
target_model_argsModelArguments object.

Contains the arguments required to load the target model.

draft_model_argsModelArguments object.

Contains the arguments required to load the draft model.

data_argsDatasetArguments object.

Contains the arguments required to load the dataset.

inferencer_argsInferencerArguments object.

Contains the arguments required to perform inference.

static score_to_prob(scores: torch.Tensor, temperature: float = 0.0, top_p: float = 1.0) torch.Tensor[source]#

Convert scores (NOT softmaxed tensor) to probabilities with support for temperature, top-p sampling, and argmax.

Parameters:
scorestorch.Tensor

Input scores.

temperaturefloat, optional

Temperature parameter for controlling randomness. Higher values make the distribution more uniform, lower values make it peakier. When temperature <= 1e-6, argmax is used. by default 0.0

top_pfloat, optional

Top-p sampling parameter for controlling the cumulative probability threshold, by default 1.0 (no threshold)

Returns:
torch.Tensor

Probability distribution after adjustments.

static sample(prob: torch.Tensor, num_samples: int = 1) Dict[source]#

Sample from a tensor of probabilities

static predict_next_token(model: lmflow.models.hf_decoder_model.HFDecoderModel, input_ids: torch.Tensor, num_new_tokens: int = 1)[source]#

Predict the next token given the input_ids.

autoregressive_sampling(input_ids: torch.Tensor, model: lmflow.models.hf_decoder_model.HFDecoderModel, temperature: float = 0.0, num_new_tokens: int = 5) Dict[source]#

Ref: [arXiv:2211.17192v2](https://arxiv.org/abs/2211.17192) Section 2.2

inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, draft_model: lmflow.models.hf_decoder_model.HFDecoderModel, input: str, temperature: float = 0.0, gamma: int = 5, max_new_tokens: int = 100)[source]#

Perform inference for a model

Parameters:
modelHFDecoderModel object.

TunableModel to verify tokens generated by the draft model.

draft_modelHFDecoderModel object.

TunableModel that provides approximations of the target model.

inputstr.

The input text (i.e., the prompt) for the model.

gammaint.

The number of tokens to be generated by the draft model within each iter.

max_new_tokensint.

The maximum number of tokens to be generated by the target model.

Returns:
output: str.

The output text generated by the model.

abstract stream_inference()[source]#
class lmflow.pipeline.inferencer.ToolInferencer(model_args, data_args, inferencer_args)[source]#

Bases: Inferencer

Initializes the ToolInferencer class with given arguments.

Parameters:
model_argsModelArguments object.

Contains the arguments required to load the model.

data_argsDatasetArguments object.

Contains the arguments required to load the dataset.

inferencer_argsInferencerArguments object.

Contains the arguments required to perform inference.

inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, input: str, max_new_tokens: int = 1024)[source]#

Perform inference for a model

Parameters:
modelHFDecoderModel object.

TunableModel to perform inference

inputstr.

The input text (i.e., the prompt) for the model.

max_new_tokensint.

The maximum number of tokens to be generated by the model.

Returns:
outputstr.

The output text generated by the model.

code_exec(code)[source]#