lmflow.utils.conversation_template#

Submodules#

Attributes#

Classes#

Package Contents#

lmflow.utils.conversation_template.EMPTY_TEMPLATE[source]#
lmflow.utils.conversation_template.EMPTY_NO_SPECIAL_TOKENS_TEMPLATE[source]#
class lmflow.utils.conversation_template.ConversationTemplate[source]#
user_formatter: Formatter#
assistant_formatter: Formatter#
function_formatter: Formatter | None = None#
observation_formatter: Formatter | None = None#
system_formatter: Formatter | None = None#
force_system: bool = False#
tools_formatter: Formatter | None = None#
separator: TemplateComponent | None = None#
remove_last_sep: bool = False#
special_starter: TemplateComponent | None = None#
special_stopper: TemplateComponent | None = None#
template_name: str | None = None#
system_default: str | None = None#
__post_init__()[source]#
encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]][source]#

Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {

“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [

{

“role”: “user”, “content”: “hi”

}, {

“role”: “assistant”, “content”: “Hello!”

}

]

}#

_encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]][source]#
_encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int][source]#

Encode template components into token ids.

Parameters:
templateList[TemplateComponent]

Formatted template components.

tokenizerPreTrainedTokenizer

Tokenizer to convert tokens into token ids.

Returns:
List[int]

Encoded token ids.

post_process_pairs(encoded_pairs, tokenizer)[source]#
remove_last_separator(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]][source]#
add_special_starter(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]][source]#
add_special_stopper(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]][source]#
_ensure_id_list(obj: int | List[int]) List[int][source]#

Make sure the object is a list of integers. Useful for handling token ids.

class lmflow.utils.conversation_template.ConversationTemplateForTool[source]#

Bases: ConversationTemplate

encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]][source]#

Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {

“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [

{

“role”: “user”, “content”: “hi”

}, {

“role”: “assistant”, “content”: “Hello!”

}

]

}#

_encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]][source]#
_encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int][source]#

Encode template components into token ids.

Parameters:
templateList[TemplateComponent]

Formatted template components.

tokenizerPreTrainedTokenizer

Tokenizer to convert tokens into token ids.

Returns:
List[int]

Encoded token ids.

_handle_tools(tools: List[str] | None) str[source]#
lmflow.utils.conversation_template.CHATGLM3_TEMPLATE[source]#
lmflow.utils.conversation_template.CHATML_TEMPLATE[source]#
lmflow.utils.conversation_template.DEEPSEEK_TEMPLATE[source]#
lmflow.utils.conversation_template.GEMMA_TEMPLATE[source]#
lmflow.utils.conversation_template.HYMBA_TEMPLATE[source]#
lmflow.utils.conversation_template.INTERNLM2_TEMPLATE[source]#
lmflow.utils.conversation_template.LLAMA2_TEMPLATE[source]#
lmflow.utils.conversation_template.LLAMA3_TEMPLATE[source]#
lmflow.utils.conversation_template.LLAMA3_TEMPLATE_FOR_TOOL[source]#
lmflow.utils.conversation_template.PHI3_TEMPLATE[source]#
lmflow.utils.conversation_template.QWEN2_TEMPLATE[source]#
lmflow.utils.conversation_template.QWEN2_TEMPLATE_FOR_TOOL[source]#
lmflow.utils.conversation_template.QWEN2_5_TEMPLATE[source]#
lmflow.utils.conversation_template.YI1_5_TEMPLATE[source]#
lmflow.utils.conversation_template.ZEPHYR_TEMPLATE[source]#
lmflow.utils.conversation_template.PRESET_TEMPLATES[source]#