lmflow.datasets.dataset
#
This Python code defines a class Dataset with methods for initializing, loading, and manipulating datasets from different backends such as Hugging Face and JSON.
The Dataset class includes methods for loading datasets from a dictionary and a Hugging Face dataset, mapping datasets, and retrieving the backend dataset and arguments.
Module Contents#
Classes#
Initializes the Dataset object with the given parameters. |
Attributes#
- lmflow.datasets.dataset.DATASET_TYPES = ['text_only', 'text2text', 'float_only', 'image_text'][source]#
- class lmflow.datasets.dataset.Dataset(data_args=None, backend: str = 'huggingface', *args, **kwargs)[source]#
Initializes the Dataset object with the given parameters.
- Parameters:
- data_argsDatasetArguments object.
Contains the arguments required to load the dataset.
- backendstr, default=”huggingface”
A string representing the dataset backend. Defaults to “huggingface”.
- argsOptional.
Positional arguments.
- kwargsOptional.
Keyword arguments.
- _check_data_format()[source]#
Checks if data type and data structure matches
Raise messages with hints if not matched.
- from_dict(dict_obj: dict, *args, **kwargs)[source]#
Create a Dataset object from a dictionary.
- Return a Dataset given a dict with format:
- {
“type”: TYPE, “instances”: [
- {
“key_1”: VALUE_1.1, “key_2”: VALUE_1.2, …
}, {
“key_1”: VALUE_2.1, “key_2”: VALUE_2.2, …
]
}
- Parameters:
- dict_objdict.
A dictionary containing the dataset information.
- argsOptional.
Positional arguments.
- kwargsOptional.
Keyword arguments.
- Returns:
- selfDataset object.
- classmethod create_from_dict(dict_obj, *args, **kwargs)[source]#
- Returns:
- Returns a Dataset object given a dict.
- to_dict()[source]#
- Returns:
- Return a dict represents the dataset:
- {
“type”: TYPE, “instances”: [
- {
“key_1”: VALUE_1.1, “key_2”: VALUE_1.2, …
}, {
“key_1”: VALUE_2.1, “key_2”: VALUE_2.2, …
]
}
- A python dict object represents the content of this dataset.