lmflow.models.vision2seq_model#
Classes#
Helper class that provides a standard way to create an ABC using |
Module Contents#
- class lmflow.models.vision2seq_model.CustomAutoVision2SeqModel(config: transformers.Blip2Config, image_encoder_name_or_path=None, qformer_name_or_path=None, language_model_name_or_path=None, low_resource=False)[source]#
Bases:
transformers.Blip2ForConditionalGeneration
,lmflow.models.base_model.BaseModel
Helper class that provides a standard way to create an ABC using inheritance.
- language_model_from_pretrained(pretrained_path, low_resource=False, use_prompt_cache=False)[source]#
- register_prompt_cache(prompt_ids, prompt_keys_values)[source]#
Udpate the prompt id and embedding for reuse in the future
- Args:
prompt_ids (torch.LongTensor): The id of the prompt. prompt_keys_values (torch.FloatTensor): The embedding of the prompt.
- Returns:
None
- save_prompt_cache(path)[source]#
Save prompt embedding and id.
- Args:
path: The path to save the prompt embedding and id.
- Returns:
None
- load_prompt_cache(path)[source]#
Load prompt embedding and id. Args:
path: The path to load the prompt embedding and id.
- Returns:
None
- forward(input_ids: torch.LongTensor = None, pixel_values: torch.FloatTensor | None = None, images: torch.FloatTensor | None = None, attention_mask: torch.Tensor | None = None, past_key_values: List[torch.FloatTensor] | None = None, inputs_embeds: torch.FloatTensor | None = None, labels: torch.LongTensor | None = None, use_cache: bool | None = None, output_attentions: bool | None = None, output_hidden_states: bool | None = None, return_dict: bool | None = None, image_token_indexes: List | None = [0], one_sample_multiple_images: bool = False) Tuple | transformers.modeling_outputs.CausalLMOutputWithPast [source]#
- processor_image_token_in_minigpt4(input_ids, language_model_inputs, attention_mask, image_token_indexes, pixel_values, batch_size=1)[source]#
- generate(pixel_values: torch.FloatTensor, input_ids: torch.LongTensor | None = None, attention_mask: torch.LongTensor | None = None, image_token_indexes: List | None = [0], one_sample_multiple_images: bool | None = False, images: torch.LongTensor | None = None, **generate_kwargs) torch.LongTensor [source]#
Overrides generate function to be able to use the model as a conditional generator.
- Args:
- pixel_values (torch.FloatTensor of shape (batch_size, num_channels, height, width)):
Input images to be processed.
- input_ids (torch.LongTensor of shape (batch_size, sequence_length), optional):
The sequence used as a prompt for the generation.
- attention_mask (torch.LongTensor of shape (batch_size, sequence_length), optional):
Mask to avoid performing attention on padding token indices
- image_token_indexes (bool, optional):
The index for inserting the image tokens.
- one_sample_multiple_images: (bool, optional):
The flag for inference that the input batch size is 1 and contain multiple images.
- Returns:
captions (list): A list of strings of length batch_size * num_captions.