DPO #30

shahbuland · 2024-01-26T00:22:58Z

Add DPO LoRA support
Should add non-lora support as well but this is WIP for now

… drawing

tmabraham · 2024-01-28T07:19:29Z

src/drlx/trainer/dpo_trainer.py

+        def time_per_1k(n_samples : int):
+            total_time = timer.hit()
+            return total_time * 1000 / n_samples


this func should probably be in utils

tmabraham · 2024-01-28T07:20:16Z

src/drlx/trainer/dpo_trainer.py

+        if not do_lora:
+            ref_model = deepcopy(self.accelerator.unwrap_model(self.model).unet)
+            ref_model.requires_grad = False
+            if self.config.method.ref_mem_strategy == "half": ref_model = ref_model.half()


hmm is it a problem that the ref model is in half? does the original DPO do this?

I'm not aware of how it would cause issues but I figured it was just a nice addition for memory saving purposes. It's not necessary if you match the original papers hyperparameters though.

shahbuland and others added 19 commits September 17, 2023 20:39

Add base trainer for any accelerate model

71c3014

Add PickaPic pipeline for DPO

af358b7

add skeleton for DPO trainer

b140623

Pipeline for DPO

704a85c

Allow for list of images instead of just list of np arrays for sample…

68ec789

… drawing

Add sampler for DPO

e70e537

Add method config for DPO

2202d9f

Add DPO trainer initial version

9304bac

basic debugs

1752621

Remove streaming

b11e873

minor bug fixes

9201d6a

Moved saving from DDPO trainer to base accelerate

35dd03d

LoRA, refactorings, quick bug fixes

e16526f

small bug fixes

14fe254

bug fixes

e121257

Fix import errors and checkpointing

6d9e03d

Add base model loss deviation to sampling as metric

c2350cb

Add base model loss deviation to trainer logging as metric

765b9f6

Add non-lora training with memory saving options in config

74012cc

tmabraham reviewed Jan 28, 2024

View reviewed changes

shahbuland and others added 9 commits January 28, 2024 17:33

some refactorings to sampling, add rmsprop

ef91f92

Delete old DPO example, push new one

38847c5

Rename DPO2 to DPO

be05515

Move DPO and DDPO sampler to their own files for better organiation

e6023a3

prepare for adding SDXL

5253473

Fix issue with modularizing samplers

54f6ec1

Add SDXL support and reorganize config for model

44f163d

Remove mandatory gradient clipping and fix model saving with new config

565efe6

Update dpo_pickapic.yml

4324932

shahbuland added 2 commits February 12, 2024 23:13

Update dpo_pickapic.yml

dde1265

Update dpo_pickapic.yml

70f1827

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO #30

DPO #30

shahbuland commented Jan 26, 2024

tmabraham Jan 28, 2024

tmabraham Jan 28, 2024

shahbuland Jan 30, 2024

DPO #30

Are you sure you want to change the base?

DPO #30

Conversation

shahbuland commented Jan 26, 2024

tmabraham Jan 28, 2024

Choose a reason for hiding this comment

tmabraham Jan 28, 2024

Choose a reason for hiding this comment

shahbuland Jan 30, 2024

Choose a reason for hiding this comment