lmflow.utils.conversation_template#
Submodules#
- lmflow.utils.conversation_template.base
- lmflow.utils.conversation_template.chatglm
- lmflow.utils.conversation_template.chatml
- lmflow.utils.conversation_template.deepseek
- lmflow.utils.conversation_template.gemma
- lmflow.utils.conversation_template.hymba
- lmflow.utils.conversation_template.internlm
- lmflow.utils.conversation_template.llama
- lmflow.utils.conversation_template.phi
- lmflow.utils.conversation_template.qwen
- lmflow.utils.conversation_template.yi
- lmflow.utils.conversation_template.zephyr
Attributes#
Classes#
Package Contents#
- class lmflow.utils.conversation_template.ConversationTemplate[source]#
-
- force_system: bool = False#
- separator: TemplateComponent | None = None#
- remove_last_sep: bool = False#
- special_starter: TemplateComponent | None = None#
- special_stopper: TemplateComponent | None = None#
- template_name: str | None = None#
- system_default: str | None = None#
- encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {
“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [
- {
“role”: “user”, “content”: “hi”
}, {
“role”: “assistant”, “content”: “Hello!”
}
]
}#
- _encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
- _encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int] [source]#
Encode template components into token ids.
- Parameters:
- templateList[TemplateComponent]
Formatted template components.
- tokenizerPreTrainedTokenizer
Tokenizer to convert tokens into token ids.
- Returns:
- List[int]
Encoded token ids.
- remove_last_separator(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]] [source]#
- add_special_starter(encoded_pairs: Sequence[Tuple[List[int], List[int]]], tokenizer: transformers.PreTrainedTokenizer) Sequence[Tuple[List[int], List[int]]] [source]#
- class lmflow.utils.conversation_template.ConversationTemplateForTool[source]#
Bases:
ConversationTemplate
- encode_conversation(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: List[str] | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
Messages here should be guaranteed to be in pairs, with the first message being the user message and the second message being the system message. Data example: ```json {
“conversation_id”: 2, “system”: “sysinfo1”, “tools”: [“tool_1_desc”], “messages”: [
- {
“role”: “user”, “content”: “hi”
}, {
“role”: “assistant”, “content”: “Hello!”
}
]
}#
- _encode(tokenizer: transformers.PreTrainedTokenizer, messages: List[Dict[str, str]], system: str | None = None, tools: str | None = None, **kwargs) Sequence[Tuple[List[int], List[int]]] [source]#
- _encode_template(template: List[TemplateComponent], tokenizer: transformers.PreTrainedTokenizer, **kwargs) List[int] [source]#
Encode template components into token ids.
- Parameters:
- templateList[TemplateComponent]
Formatted template components.
- tokenizerPreTrainedTokenizer
Tokenizer to convert tokens into token ids.
- Returns:
- List[int]
Encoded token ids.