lmflow.utils.conversation_template#
Submodules#
- lmflow.utils.conversation_template.base
- lmflow.utils.conversation_template.chatglm
- lmflow.utils.conversation_template.chatml
- lmflow.utils.conversation_template.deepseek
- lmflow.utils.conversation_template.gemma
- lmflow.utils.conversation_template.hymba
- lmflow.utils.conversation_template.internlm
- lmflow.utils.conversation_template.llama
- lmflow.utils.conversation_template.phi
- lmflow.utils.conversation_template.qwen
- lmflow.utils.conversation_template.yi
- lmflow.utils.conversation_template.zephyr
Attributes#
Classes#
Functions#
|
Package Contents#
- class lmflow.utils.conversation_template.ConversationTemplate[source]#
-
- force_system: bool = False#
- separator: TemplateComponent | None = None#
- remove_last_sep: bool = False#
- special_starter: TemplateComponent | None = None#
- special_stopper: TemplateComponent | None = None#
- template_name: str | None = None#
- system_default: str | None = None#
- encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {
“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [
- {
“role”: “user”, “content”: “hi”
}, {
“role”: “assistant”, “content”: “Hello!”
}
]
}#
- _encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
- _encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int] [source]#
Encode template components into token ids.
- Parameters:
- templateList[TemplateComponent]
Formatted template components.
- tokenizerPreTrainedTokenizer
Tokenizer to convert tokens into token ids.
- Returns:
- List[int]
Encoded token ids.
- remove_last_separator(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]] [source]#
- add_special_starter(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]] [source]#
- class lmflow.utils.conversation_template.ConversationTemplateForTool[source]#
Bases:
ConversationTemplate
- encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {
“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [
- {
“role”: “user”, “content”: “hi”
}, {
“role”: “assistant”, “content”: “Hello!”
}
]
}#
- _encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
- _encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int] [source]#
Encode template components into token ids.
- Parameters:
- templateList[TemplateComponent]
Formatted template components.
- tokenizerPreTrainedTokenizer
Tokenizer to convert tokens into token ids.
- Returns:
- List[int]
Encoded token ids.
- lmflow.utils.conversation_template.DEEPSEEK_V3_TEMPLATE = Multiline-String[source]#
Show Value
"""{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set ns = namespace(is_first=false, is_tool=false, is_output_first=true, system_prompt='', is_first_sp=true) %}{%- for message in messages %}{%- if message['role'] == 'system' %}{%- if ns.is_first_sp %}{% set ns.system_prompt = ns.system_prompt + message['content'] %}{% set ns.is_first_sp = false %}{%- else %}{% set ns.system_prompt = ns.system_prompt + ' ' + message['content'] %}{%- endif %}{%- endif %}{%- endfor %}{{bos_token}}{{ns.system_prompt}}{%- for message in messages %}{%- if message['role'] == 'user' %}{%- set ns.is_tool = false -%}{{'<|User|>' + message['content']}}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is none %}{%- set ns.is_tool = false -%}{%- for tool in message['tool_calls']%}{%- if not ns.is_first %}{{'<|Assistant|>'}}{% generation %}{{'<|tool▁calls▁begin|><|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + ' ' + '```json' + ' ' + tool['function']['arguments'] + ' ' + '```' + '<|tool▁call▁end|>'}}{% endgeneration %}{%- set ns.is_first = true -%}{%- else %}{% generation %}{{' ' + '<|tool▁call▁begin|>' + tool['type'] + '<|tool▁sep|>' + tool['function']['name'] + ' ' + '```json' + ' ' + tool['function']['arguments'] + ' ' + '```' + '<|tool▁call▁end|>'}}{{'<|tool▁calls▁end|><|end▁of▁sentence|>'}}{% endgeneration %}{%- endif %}{%- endfor %}{%- endif %}{%- if message['role'] == 'assistant' and message['content'] is not none %}{%- if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% generation %}{{ message['content'] + '<|end▁of▁sentence|>'}}{%- set ns.is_tool = false -%}{% endgeneration %}{%- else %}{{'<|Assistant|>'}}{% generation %}{{ message['content'] + '<|end▁of▁sentence|>'}}{% endgeneration %}{%- endif %}{%- endif %}{%- if message['role'] == 'tool' %}{%- set ns.is_tool = true -%}{%- if ns.is_output_first %}{{'<|tool▁outputs▁begin|><|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- set ns.is_output_first = false %}{%- else %}{{' <|tool▁output▁begin|>' + message['content'] + '<|tool▁output▁end|>'}}{%- endif %}{%- endif %}{%- endfor -%}{% if ns.is_tool %}{{'<|tool▁outputs▁end|>'}}{% endif %}{% if add_generation_prompt and not ns.is_tool %}{{'<|Assistant|>'}}{% endif %}"""
- lmflow.utils.conversation_template.DEEPSEEK_R1_TEMPLATE = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif...[source]#
- lmflow.utils.conversation_template.DEEPSEEK_R1_DISTILL_TEMPLATE = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif...[source]#
- lmflow.utils.conversation_template.QWEN2_5_TEMPLATE = '{%- if tools %}{{- \'<|im_start|>system\\n\' }}{%- if messages[0][\'role\'] == \'system\' %}{{-...[source]#
- lmflow.utils.conversation_template.QWEN2_5_1M_TEMPLATE = '{%- if tools %}{{- \'<|im_start|>system\\n\' }}{%- if messages[0][\'role\'] == \'system\' %}{{-...[source]#
- lmflow.utils.conversation_template.QWEN2_5_MATH_TEMPLATE = '{%- if tools %}{{- \'<|im_start|>system\\n\' }}{%- if messages[0][\'role\'] == \'system\' %}{{-...[source]#