lmflow.pipeline.inferencer#

The Inferencer class simplifies the process of model inferencing.

Attributes#

`supported_dataset_type`
`logger`

Classes#

`Inferencer`	Initializes the Inferencer class with given arguments.
`SpeculativeInferencer`	Ref: [arXiv:2211.17192v2](https://arxiv.org/abs/2211.17192)
`ToolInferencer`	Initializes the ToolInferencer class with given arguments.

Functions#

rstrip_partial_utf8(string)

Module Contents#

lmflow.pipeline.inferencer.rstrip_partial_utf8(string)[source]#

lmflow.pipeline.inferencer.supported_dataset_type = ['text_only', 'image_text'][source]#

lmflow.pipeline.inferencer.logger[source]#

class lmflow.pipeline.inferencer.Inferencer(model_args: lmflow.args.ModelArguments, data_args: lmflow.args.DatasetArguments, inferencer_args: lmflow.args.InferencerArguments)[source]#

Bases: lmflow.pipeline.base_pipeline.BasePipeline

Initializes the Inferencer class with given arguments.

Parameters:

model_argsModelArguments object.: Contains the arguments required to load the model.
data_argsDatasetArguments object.: Contains the arguments required to load the dataset.
inferencer_argsInferencerArguments object.: Contains the arguments required to perform inference.

data_args[source]#

inferencer_args[source]#

model_args[source]#

local_rank[source]#

world_size[source]#

config[source]#

create_dataloader(dataset: lmflow.datasets.dataset.Dataset)[source]#

Batchlize dataset and format it to dataloader.

Args:: dataset (Dataset): the dataset object
Output:: dataloader (batchlize): the dataloader object dataset_size (int): the length of the dataset

inference(model, dataset: lmflow.datasets.dataset.Dataset, max_new_tokens: int = 100, temperature: float = 0.0, prompt_structure: str = '{input}', remove_image_flag: bool = False, chatbot_type: str = 'mini_gpt')[source]#

Perform inference for a model

Parameters:

modelTunableModel object.: TunableModel to perform inference
datasetDataset object.
Returns:
output_dataset: Dataset object.

stream_inference(context, model, max_new_tokens, token_per_step, temperature, end_string, input_dataset, remove_image_flag: bool = False)[source]#

class lmflow.pipeline.inferencer.SpeculativeInferencer(model_args, draft_model_args, data_args, inferencer_args)[source]#

Bases: Inferencer

Ref: [arXiv:2211.17192v2](https://arxiv.org/abs/2211.17192)

Parameters:

target_model_argsModelArguments object.: Contains the arguments required to load the target model.
draft_model_argsModelArguments object.: Contains the arguments required to load the draft model.
data_argsDatasetArguments object.: Contains the arguments required to load the dataset.
inferencer_argsInferencerArguments object.: Contains the arguments required to perform inference.

draft_model_args[source]#

draft_config[source]#

static score_to_prob(scores: torch.Tensor, temperature: float = 0.0, top_p: float = 1.0) → torch.Tensor[source]#

Convert scores (NOT softmaxed tensor) to probabilities with support for temperature, top-p sampling, and argmax.

Parameters:

scorestorch.Tensor: Input scores.
temperaturefloat, optional: Temperature parameter for controlling randomness. Higher values make the distribution more uniform, lower values make it peakier. When temperature <= 1e-6, argmax is used. by default 0.0
top_pfloat, optional: Top-p sampling parameter for controlling the cumulative probability threshold, by default 1.0 (no threshold)

Returns:

torch.Tensor: Probability distribution after adjustments.

static sample(prob: torch.Tensor, num_samples: int = 1) → dict[source]#: Sample from a tensor of probabilities

static predict_next_token(model: lmflow.models.hf_decoder_model.HFDecoderModel, input_ids: torch.Tensor, num_new_tokens: int = 1)[source]#: Predict the next token given the input_ids.

autoregressive_sampling(input_ids: torch.Tensor, model: lmflow.models.hf_decoder_model.HFDecoderModel, temperature: float = 0.0, num_new_tokens: int = 5) → dict[source]#: Ref: [arXiv:2211.17192v2](https://arxiv.org/abs/2211.17192) Section 2.2

inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, draft_model: lmflow.models.hf_decoder_model.HFDecoderModel, input: str, temperature: float = 0.0, gamma: int = 5, max_new_tokens: int = 100)[source]#

Perform inference for a model

Parameters:

modelHFDecoderModel object.: TunableModel to verify tokens generated by the draft model.
draft_modelHFDecoderModel object.: TunableModel that provides approximations of the target model.
inputstr.: The input text (i.e., the prompt) for the model.
gammaint.: The number of tokens to be generated by the draft model within each iter.
max_new_tokensint.: The maximum number of tokens to be generated by the target model.

Returns:

output: str.: The output text generated by the model.

abstractmethod stream_inference()[source]#

class lmflow.pipeline.inferencer.ToolInferencer(model_args, data_args, inferencer_args)[source]#

Bases: Inferencer

Initializes the ToolInferencer class with given arguments.

Parameters:

model_argsModelArguments object.: Contains the arguments required to load the model.
data_argsDatasetArguments object.: Contains the arguments required to load the dataset.
inferencer_argsInferencerArguments object.: Contains the arguments required to perform inference.

model[source]#

inference(model: lmflow.models.hf_decoder_model.HFDecoderModel, input: str, max_new_tokens: int = 1024)[source]#

Perform inference for a model

Parameters:

modelHFDecoderModel object.: TunableModel to perform inference
inputstr.: The input text (i.e., the prompt) for the model.
max_new_tokensint.: The maximum number of tokens to be generated by the model.
Returns:
outputstr.: The output text generated by the model.

code_exec(code)[source]#

lmflow.pipeline.inferencer#

Attributes#

Classes#

Functions#

Module Contents#

This Page