lmflow.utils.data_utils#
The program includes several functions: setting a random seed, loading data from a JSON file, batching data, and extracting answers from generated text.
Classes#
Functions#
|
Set the random seed for random, numpy, torch, torch.cuda. |
|
Load data with file name. |
|
Convert examples to a dataloader. |
|
Use this funtion to extract answers from generated text |
|
Module Contents#
- lmflow.utils.data_utils.set_random_seed(seed: int)[source]#
Set the random seed for random, numpy, torch, torch.cuda.
- Parameters:
- seedint
The default seed.
- lmflow.utils.data_utils.load_data(file_name: str)[source]#
Load data with file name.
- Parameters:
- file_namestr.
The dataset file name.
- Returns:
- inputslist.
The input texts of the dataset.
- outputslist.
The output texts file datasets.
- lenint.
The length of the dataset.
- lmflow.utils.data_utils.batchlize(examples: list, batch_size: int, random_shuffle: bool)[source]#
Convert examples to a dataloader.
- Parameters:
- exampleslist.
Data list.
- batch_sizeint.
- random_shufflebool
If true, the dataloader shuffle the training data.
- Returns:
- dataloader:
Dataloader with batch generator.
- lmflow.utils.data_utils.answer_extraction(response, answer_type=None)[source]#
Use this funtion to extract answers from generated text
- Parameters:
- args
Arguments.
- responsestr
plain string response.
- Returns:
- answer:
Decoded answer (such as A, B, C, D, E for mutiple-choice QA).