We've released our memory-efficient finetuning algorithm LISA, check out [Paper][User Guide] for more details!

lmflow.utils.data_utils#

The program includes several functions: setting a random seed, loading data from a JSON file, batching data, and extracting answers from generated text.

Module Contents#

Functions#

set_random_seed(seed)

Set the random seed for random, numpy, torch, torch.cuda.

load_data(file_name)

Load data with file name.

batchlize(examples, batch_size, random_shuffle)

Convert examples to a dataloader.

answer_extraction(response[, answer_type])

Use this funtion to extract answers from generated text

process_image_flag(text[, image_flag])

lmflow.utils.data_utils.set_random_seed(seed: int)[source]#

Set the random seed for random, numpy, torch, torch.cuda.

Parameters:
seedint

The default seed.

lmflow.utils.data_utils.load_data(file_name: str)[source]#

Load data with file name.

Parameters:
file_namestr.

The dataset file name.

Returns:
inputslist.

The input texts of the dataset.

outputslist.

The output texts file datasets.

lenint.

The length of the dataset.

lmflow.utils.data_utils.batchlize(examples: list, batch_size: int, random_shuffle: bool)[source]#

Convert examples to a dataloader.

Parameters:
exampleslist.

Data list.

batch_sizeint.
random_shufflebool

If true, the dataloader shuffle the training data.

Returns:
dataloader:

Dataloader with batch generator.

lmflow.utils.data_utils.answer_extraction(response, answer_type=None)[source]#

Use this funtion to extract answers from generated text

Parameters:
args

Arguments.

responsestr

plain string response.

Returns:
answer:

Decoded answer (such as A, B, C, D, E for mutiple-choice QA).

lmflow.utils.data_utils.process_image_flag(text, image_flag='<ImageHere>')[source]#