We've released our memory-efficient finetuning algorithm LISA, check out [Paper][User Guide] for more details!

lmflow.datasets.multi_modal_dataset#

This Python code defines a class Multi Modal Dataset.

Module Contents#

Classes#

CustomMultiModalDataset

Dataset for Multi Modal data

DataCollatorForSupervisedDataset

Collate examples for supervised fine-tuning.

Functions#

preprocess_multimodal_llava(sources, data_args)

tokenizer_image_token(prompt, tokenizer[, ...])

preprocess_llama_from_llava_plain(sources, tokenizer)

This function just add the image in the front of text.

preprocess_llama_from_llava_v1(sources, tokenizer[, ...])

This function add the prompt and then put the image after the prompt.

class lmflow.datasets.multi_modal_dataset.CustomMultiModalDataset(dataset_path: str, data_args: lmflow.args.DatasetArguments)[source]#

Bases: torch.utils.data.Dataset

Dataset for Multi Modal data

__len__()[source]#
register_tokenizer(tokenizer, image_processor=None)[source]#
__getitem__(i)[source]#
lmflow.datasets.multi_modal_dataset.preprocess_multimodal_llava(sources, data_args)[source]#
lmflow.datasets.multi_modal_dataset.tokenizer_image_token(prompt, tokenizer, image_token_index=IMAGE_TOKEN_INDEX, return_tensors=None)[source]#
lmflow.datasets.multi_modal_dataset.preprocess_llama_from_llava_plain(sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False)[source]#

This function just add the image in the front of text. And don’t add any prompt. Args:

sources: The input data with text and image. tokenizer: The tokenizer to process text. has_image: Whether the input data has image.

Returns:

The input_ids and labels for the model.

lmflow.datasets.multi_modal_dataset.preprocess_llama_from_llava_v1(sources, tokenizer: transformers.PreTrainedTokenizer, has_image: bool = False)[source]#

This function add the prompt and then put the image after the prompt. So it needs additional code to generate the target label. Args:

sources: The input data with text and image. tokenizer: The tokenizer to process text. has_image: Whether the input data has image.

Returns:

The input_ids and labels for the model.

class lmflow.datasets.multi_modal_dataset.DataCollatorForSupervisedDataset[source]#

Bases: object

Collate examples for supervised fine-tuning.

tokenizer: transformers.PreTrainedTokenizer[source]#
__call__(instances)[source]#