zero3 DPO starcoder OOM #161

oo0-0-0oo · 2024-06-26T03:37:52Z

when I use DPO to train 7B starcoder, OOM happened, i used 16 A100 ,zero3
used TRL and transformers, When the code runs to AutoModelForCausalLM.from_pretrained , OOM happened. but qwencoder don't have this trouble. Are there any special settings in the model's structure that are not suitable for DPO (Direct Policy Optimization)?
code is
`parser = HfArgumentParser(DPOTrainingArguments)
args = parser.parse_args_into_dataclasses()[0]
down_file(args.model_path, args.pretrained_model)

model = AutoModelForCausalLM.from_pretrained(args.model_path)
model_ref = AutoModelForCausalLM.from_pretrained(args.model_path)
tokenizer = AutoTokenizer.from_pretrained(args.model_path)

dpo_trainer = MyDPOTrainer(
model,
model_ref,
args=args,
....
)

dpo_trainer.train()

`

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zero3 DPO starcoder OOM #161

zero3 DPO starcoder OOM #161

oo0-0-0oo commented Jun 26, 2024

zero3 DPO starcoder OOM #161

zero3 DPO starcoder OOM #161

Comments

oo0-0-0oo commented Jun 26, 2024