lmflow.datasets.multi_modal_dataset#
This Python code defines a class Multi Modal Dataset.
Classes#
Dataset for Multi Modal data |
|
Collate examples for supervised fine-tuning. |
Functions#
|
|
|
|
|
This function just add the image in the front of text. |
|
This function add the prompt and then put the image after the prompt. |
Module Contents#
- class lmflow.datasets.multi_modal_dataset.CustomMultiModalDataset(dataset_path: str, data_args: lmflow.args.DatasetArguments)[source]#
Bases:
torch.utils.data.Dataset
Dataset for Multi Modal data
- lmflow.datasets.multi_modal_dataset.tokenizer_image_token(prompt, tokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)[source]#
- lmflow.datasets.multi_modal_dataset.preprocess_llama_from_llava_plain(sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False)[source]#
This function just add the image in the front of text. And don’t add any prompt. Args:
sources: The input data with text and image. tokenizer: The tokenizer to process text. has_image: Whether the input data has image.
- Returns:
The input_ids and labels for the model.
- lmflow.datasets.multi_modal_dataset.preprocess_llama_from_llava_v1(sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False)[source]#
This function add the prompt and then put the image after the prompt. So it needs additional code to generate the target label. Args:
sources: The input data with text and image. tokenizer: The tokenizer to process text. has_image: Whether the input data has image.
- Returns:
The input_ids and labels for the model.