lmflow.pipeline.dpo_aligner#

Classes#

Functions#

get_paired_dataset(→ datasets.Dataset)

Load dataset and convert it to the necessary format.

Module Contents#

lmflow.pipeline.dpo_aligner.get_paired_dataset(data_root: str, data_dir: str, sanity_check: bool = False, cache_dir: str | None = None, num_proc=24) datasets.Dataset[source]#

Load dataset and convert it to the necessary format.

The dataset is converted to a dictionary with the following structure: {

‘prompt’: List[str], ‘chosen’: List[str], ‘rejected’: List[str],

}

Prompts are structured as follows:

“Question: “ + <prompt> + “

Answer: “

class lmflow.pipeline.dpo_aligner.DPOAligner(model_args, data_args, aligner_args)[source]#

Bases: lmflow.pipeline.base_aligner.BaseAligner

model_args[source]#
data_args[source]#
aligner_args[source]#
train_dataset = None[source]#
eval_dataset = None[source]#
_initialize_trainer(model, tokenizer)[source]#
_load_dataset()[source]#
align(model, dataset, reward_model)[source]#