We've released our memory-efficient finetuning algorithm LISA, check out [Paper][User Guide] for more details!

lmflow.utils.conversation_template#

Submodules#

Package Contents#

Classes#

ConversationTemplate

Attributes#

EMPTY_TEMPLATE

EMPTY_NO_SPECIAL_TOKENS_TEMPLATE

CHATGLM3_TEMPLATE

CHATML_TEMPLATE

DEEPSEEK_TEMPLATE

GEMMA_TEMPLATE

INTERNLM2_TEMPLATE

LLAMA2_TEMPLATE

LLAMA3_TEMPLATE

PHI3_TEMPLATE

QWEN2_TEMPLATE

YI1_5_TEMPLATE

ZEPHYR_TEMPLATE

PRESET_TEMPLATES

lmflow.utils.conversation_template.EMPTY_TEMPLATE[source]#
lmflow.utils.conversation_template.EMPTY_NO_SPECIAL_TOKENS_TEMPLATE[source]#
class lmflow.utils.conversation_template.ConversationTemplate[source]#
user_formatter: Formatter#
assistant_formatter: Formatter#
system_formatter: Formatter | None#
tools_formatter: Formatter | None#
separator: TemplateComponent | None#
special_starter: TemplateComponent | None#
special_stopper: TemplateComponent | None#
template_name: str | None#
__post_init__()[source]#
encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, remove_last_sep: bool = False, **kwargs) Sequence[Tuple[List[int], List[int]]][source]#

Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {

“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [

{

“role”: “user”, “content”: “hi”

}, {

“role”: “assistant”, “content”: “Hello!”

}

]

}#

_encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]][source]#
_encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int][source]#

Encode template components into token ids.

Parameters:
templateList[TemplateComponent]

Formatted template components.

tokenizerPreTrainedTokenizer

Tokenizer to convert tokens into token ids.

Returns:
List[int]

Encoded token ids.

remove_last_separator(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]][source]#
add_special_starter(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]][source]#
add_special_stopper(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]][source]#
_ensure_id_list(obj: int | List[int]) List[int][source]#

Make sure the object is a list of integers. Useful for handling token ids.

lmflow.utils.conversation_template.CHATGLM3_TEMPLATE[source]#
lmflow.utils.conversation_template.CHATML_TEMPLATE[source]#
lmflow.utils.conversation_template.DEEPSEEK_TEMPLATE[source]#
lmflow.utils.conversation_template.GEMMA_TEMPLATE[source]#
lmflow.utils.conversation_template.INTERNLM2_TEMPLATE[source]#
lmflow.utils.conversation_template.LLAMA2_TEMPLATE[source]#
lmflow.utils.conversation_template.LLAMA3_TEMPLATE[source]#
lmflow.utils.conversation_template.PHI3_TEMPLATE[source]#
lmflow.utils.conversation_template.QWEN2_TEMPLATE[source]#
lmflow.utils.conversation_template.YI1_5_TEMPLATE[source]#
lmflow.utils.conversation_template.ZEPHYR_TEMPLATE[source]#
lmflow.utils.conversation_template.PRESET_TEMPLATES[source]#