Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero3 DPO starcoder OOM #161

Open
oo0-0-0oo opened this issue Jun 26, 2024 · 0 comments
Open

zero3 DPO starcoder OOM #161

oo0-0-0oo opened this issue Jun 26, 2024 · 0 comments

Comments

@oo0-0-0oo
Copy link

when I use DPO to train 7B starcoder, OOM happened, i used 16 A100 ,zero3
used TRL and transformers, When the code runs to AutoModelForCausalLM.from_pretrained , OOM happened. but qwencoder don't have this trouble. Are there any special settings in the model's structure that are not suitable for DPO (Direct Policy Optimization)?
code is
`parser = HfArgumentParser(DPOTrainingArguments)
args = parser.parse_args_into_dataclasses()[0]
down_file(args.model_path, args.pretrained_model)

model = AutoModelForCausalLM.from_pretrained(args.model_path)
model_ref = AutoModelForCausalLM.from_pretrained(args.model_path)
tokenizer = AutoTokenizer.from_pretrained(args.model_path)

dpo_trainer = MyDPOTrainer(
model,
model_ref,
args=args,
....
)

dpo_trainer.train()

`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant