lmflow.utils.data_utils#
The program includes several functions: setting a random seed, loading data from a JSON file, batching data, and extracting answers from generated text.
Classes#
Functions#
|
Set the random seed for random, numpy, torch, torch.cuda. |
|
Load data with file name. |
|
Convert examples to a dataloader. |
|
|
|
|
|
Get the type values from the first and last n lines of a large json dataset. |
|
Check if the dataset instances key matches the instance_key. |
|
Use this funtion to extract answers from generated text |
|
Module Contents#
- lmflow.utils.data_utils.set_random_seed(seed: int)[source]#
Set the random seed for random, numpy, torch, torch.cuda.
- Parameters:
- seedint
The default seed.
- lmflow.utils.data_utils.load_data(file_name: str)[source]#
Load data with file name.
- Parameters:
- file_namestr.
The dataset file name.
- Returns:
- inputslist.
The input texts of the dataset.
- outputslist.
The output texts file datasets.
- lenint.
The length of the dataset.
- lmflow.utils.data_utils.batchlize(examples: list, batch_size: int, random_shuffle: bool)[source]#
Convert examples to a dataloader.
- Parameters:
- exampleslist.
Data list.
- batch_sizeint.
- random_shufflebool
If true, the dataloader shuffle the training data.
- Returns:
- dataloader:
Dataloader with batch generator.
- lmflow.utils.data_utils.read_last_n_lines_large_file(file_path: str, n: int = 10) List[str] [source]#
- lmflow.utils.data_utils.read_first_n_lines_large_file(file_path: str, n: int = 10) List[str] [source]#
- lmflow.utils.data_utils.get_dataset_type_fast(file_path: str, max_lines: int = 100) str | None [source]#
Get the type values from the first and last n lines of a large json dataset.
- lmflow.utils.data_utils.check_dataset_instances_key_fast(file_path: str, instances_key: str, max_lines: int = 100) bool [source]#
Check if the dataset instances key matches the instance_key.
- lmflow.utils.data_utils.answer_extraction(response, answer_type=None)[source]#
Use this funtion to extract answers from generated text
- Parameters:
- args
Arguments.
- responsestr
plain string response.
- Returns:
- answer:
Decoded answer (such as A, B, C, D, E for mutiple-choice QA).