-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to Run PPO Training Using HuggingFace Path of SFT'd language model #10
Comments
Oh, there's already an open issue with |
Hi, I met the same problem when running the PPO experiment with this code. Mainly on using reward-model-human to evaluate the generated answer.
any idea? |
Hey just to close on this issue, indeed I believe this is something that changed in transformers. As Rylan pointed out there is an open issue in trlx, but downgrading your transformers version should probably fix this. The pyproject file in this project should install the correct versions of things, and also trlx sets the transformers requirement to transformers==4.32.0. Also some people have reported some issues with accelerate or deepspeed versions, in which case a pretty extreme downgrade to accelerate==0.22.0 and deepspeed==0.10.1 should definitely work. As for tsWeen0309's issue, this seems like it has to do with an issue in your alpacafam setup rather than this project, so I would recommend looking at that. It seems maybe you don't have the model downloaded. |
I'm trying to run vanilla PPO against either a single reward model or an ensemble of 5 reward models.
Command:
accelerate launch --main_process_port=29503 --config_file configs/accelerate_config.yaml src/ppo/trainer_rl.py --configs defaults defaults_rlhf pythia_44m_rlhf_ensemble_mean
My config is here:
However, the SFT config's
model_name
throws this error:Comparing RylanSchaeffer/switching_rms_pythia_sft_1p4b_seed0 against tlc4418/pythia_1.4b_sft_policy/tree/main, I see that the SFT'd models I created have
[model.safetensors](https://huggingface.co/RylanSchaeffer/switching_rms_pythia_sft_1p4b_seed0/blob/main/model.safetensors)
whereas your SFT'd models have
[pytorch_model.bin](https://huggingface.co/tlc4418/pythia_1.4b_sft_policy/blob/main/pytorch_model.bin)
:I suspect that something changed in
transformers
in the intervening time.I'm going to go open an issue with
trlx
but can you suggest any workarounds?Perhaps it would be helpful to specify the exact library versions you used for your experiments :)
The text was updated successfully, but these errors were encountered: