lmflow.utils.data_utils ======================= .. py:module:: lmflow.utils.data_utils .. autoapi-nested-parse:: The program includes several functions: setting a random seed, loading data from a JSON file, batching data, and extracting answers from generated text. .. !! processed by numpydoc !! Classes ------- .. autoapisummary:: lmflow.utils.data_utils.VLLMInferenceResultWithInput lmflow.utils.data_utils.RewardModelInferenceResultWithInput Functions --------- .. autoapisummary:: lmflow.utils.data_utils.set_random_seed lmflow.utils.data_utils.load_data lmflow.utils.data_utils.batchlize lmflow.utils.data_utils.preview_file lmflow.utils.data_utils.get_dataset_type_fast lmflow.utils.data_utils.check_dataset_instances_key_fast lmflow.utils.data_utils.answer_extraction lmflow.utils.data_utils.process_image_flag Module Contents --------------- .. py:function:: set_random_seed(seed: int) Set the random seed for `random`, `numpy`, `torch`, `torch.cuda`. :Parameters: **seed** : int The default seed. .. !! processed by numpydoc !! .. py:function:: load_data(file_name: str) Load data with file name. :Parameters: **file_name** : str. The dataset file name. :Returns: **inputs** : list. The input texts of the dataset. **outputs** : list. The output texts file datasets. **len** : int. The length of the dataset. .. !! processed by numpydoc !! .. py:function:: batchlize(examples: list, batch_size: int, random_shuffle: bool) Convert examples to a dataloader. :Parameters: **examples** : list. Data list. **batch_size** : int. .. **random_shuffle** : bool If true, the dataloader shuffle the training data. :Returns: dataloader: Dataloader with batch generator. .. !! processed by numpydoc !! .. py:function:: preview_file(file_path: str, chars: int = 100) Returns the first and last specified number of characters from a file without loading the entire file into memory, working with any file type. Args: file_path (str): Path to the file to be previewed chars (int, optional): Number of characters to show from start and end. Defaults to 100. Returns: tuple: (first_chars, last_chars) - The first and last characters from the file .. !! processed by numpydoc !! .. py:function:: get_dataset_type_fast(file_path: str, max_chars: int = 100) -> Union[str, None] Get the type values from the first and last n lines of a large json dataset. .. !! processed by numpydoc !! .. py:function:: check_dataset_instances_key_fast(file_path: str, instances_key: str, max_lines: int = 100) -> bool Check if the dataset instances key matches the instance_key. .. !! processed by numpydoc !! .. py:function:: answer_extraction(response, answer_type=None) Use this funtion to extract answers from generated text :Parameters: **args** Arguments. **response** : str plain string response. :Returns: answer: Decoded answer (such as A, B, C, D, E for mutiple-choice QA). .. !! processed by numpydoc !! .. py:function:: process_image_flag(text, image_flag='') .. py:class:: VLLMInferenceResultWithInput Bases: :py:obj:`TypedDict` .. !! processed by numpydoc !! .. py:attribute:: input :type: str .. py:attribute:: output :type: Union[List[str], List[List[int]]] .. py:class:: RewardModelInferenceResultWithInput Bases: :py:obj:`TypedDict` .. !! processed by numpydoc !! .. py:attribute:: input :type: str .. py:attribute:: output :type: List[Dict[str, Union[str, float]]]