lmflow.pipeline.dpo_aligner#

Classes#

DPOAligner

A subclass of BasePipeline which is alignable.

Functions#

get_paired_dataset(→ datasets.Dataset)

Load dataset and convert it to the necessary format.

Module Contents#

lmflow.pipeline.dpo_aligner.get_paired_dataset(data_root: str, data_dir: str, sanity_check: bool = False, cache_dir: str | None = None, num_proc=24) → datasets.Dataset[source]#

Load dataset and convert it to the necessary format.

The dataset is converted to a dictionary with the following structure: {

‘prompt’: list[str], ‘chosen’: list[str], ‘rejected’: list[str],

}

Prompts are structured as follows:
“Question: “ + <prompt> + “

Answer: “

class lmflow.pipeline.dpo_aligner.DPOAligner(model_args, data_args, aligner_args)[source]#

Bases: lmflow.pipeline.base_aligner.BaseAligner

A subclass of BasePipeline which is alignable.

model_args[source]#

data_args[source]#

aligner_args[source]#

train_dataset = None[source]#

eval_dataset = None[source]#

_initialize_trainer(model, tokenizer)[source]#

_load_dataset()[source]#

align(model, dataset, reward_model)[source]#

lmflow.pipeline.dpo_aligner#

Classes#

Functions#

Module Contents#

This Page