Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Run PPO Training Using HuggingFace Path of SFT'd language model #10

Closed
RylanSchaeffer opened this issue Jul 17, 2024 · 3 comments

Comments

@RylanSchaeffer
Copy link

I'm trying to run vanilla PPO against either a single reward model or an ensemble of 5 reward models.

Command: accelerate launch --main_process_port=29503 --config_file configs/accelerate_config.yaml src/ppo/trainer_rl.py --configs defaults defaults_rlhf pythia_44m_rlhf_ensemble_mean

My config is here:

pythia_44m_rlhf_ensemble_mean:
  output_dir: runs/ppo_ensemble
  datasets:
    - alpaca_farm

  gold_config:
    model_name: alpaca_farm_models/reward-model-human
    is_alpacafarm_rm: true
    batch_size: 32

  rank_config:
    is_reward_model: true
    model_names: 
      - models/rm/switching_rms_pythia_rm_44m_sftseed0_seed0

    objective_name: mean # Change objetive (mean, random, WCO, or UWO)
    uwo_weight: 0.1 # Change UWO weight (only for UWO)
    cache_dir: .cache
    pooling: last
    residual_dropout: 0.01
    use_flash_attention: false
    dtype: bf16
    batch_size: 128

  sft_config:
    is_reward_model: false
    model_name: RylanSchaeffer/switching_rms_pythia_sft_1p4b_seed0
    cache_dir: .cache
    quantization: false
    seq2seqmodel: false
    freeze_layer:
    num_layers_unfrozen: 2 
    residual_dropout: 0.2
    use_flash_attention: false
    dtype: bf16
    batch_size: 32

However, the SFT config's model_name throws this error:

[rank0]:     index_file_name = hf_hub_download(
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1221, in hf_hub_download
[rank0]:     return _hf_hub_download_to_cache_dir(
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1282, in _hf_hub_download_to_cache_dir
[rank0]:     (url_to_download, etag, commit_hash, expected_size, head_call_error) = _get_metadata_or_catch_error(
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1722, in _get_metadata_or_catch_error
[rank0]:     metadata = get_hf_file_metadata(url=url, proxies=proxies, timeout=etag_timeout, headers=headers)
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 1645, in get_hf_file_metadata
[rank0]:     r = _request_wrapper(
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 372, in _request_wrapper
[rank0]:     response = _request_wrapper(
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/file_download.py", line 396, in _request_wrapper
[rank0]:     hf_raise_for_status(response)
[rank0]:   File "/lfs/ampere8/0/rschaef/miniconda3/envs/reward_modeling_env/lib/python3.10/site-packages/huggingface_hub/utils/_errors.py", line 315, in hf_raise_for_status
[rank0]:     raise EntryNotFoundError(message, response) from e
[rank0]: huggingface_hub.utils._errors.EntryNotFoundError: 404 Client Error. (Request ID: Root=1-6697d66d-22a2a9095f1788b949b35ebc;622b8343-28b0-4609-aeeb-c5fbba1f0c30)

[rank0]: Entry Not Found for url: https://huggingface.co/RylanSchaeffer/switching_rms_pythia_sft_1p4b_seed0/resolve/main/pytorch_model.bin.index.json.

Comparing RylanSchaeffer/switching_rms_pythia_sft_1p4b_seed0 against tlc4418/pythia_1.4b_sft_policy/tree/main, I see that the SFT'd models I created have [model.safetensors](https://huggingface.co/RylanSchaeffer/switching_rms_pythia_sft_1p4b_seed0/blob/main/model.safetensors)

image

whereas your SFT'd models have [pytorch_model.bin](https://huggingface.co/tlc4418/pythia_1.4b_sft_policy/blob/main/pytorch_model.bin):

image

I suspect that something changed in transformers in the intervening time.

I'm going to go open an issue with trlx but can you suggest any workarounds?

Perhaps it would be helpful to specify the exact library versions you used for your experiments :)

@RylanSchaeffer
Copy link
Author

Oh, there's already an open issue with trlx here! CarperAI/trlx#580

@neilwen987
Copy link

neilwen987 commented Dec 10, 2024

Hi, I met the same problem when running the PPO experiment with this code. Mainly on using reward-model-human to evaluate the generated answer.
I‘m using the config

pythia_rlhf_individual:
output_dir: runs/ppo_individual
datasets:
- alpaca_farm
gold_config:
model_name: tatsu-lab/alpaca-farm-reward-model-human-wdiff
is_alpacafarm_rm: True
batch_size: 32
rank_config:
is_reward_model: true
model_names:
- models/rm-pythia-44m_seed1
cache_dir: .cache
pooling: last
residual_dropout: 0.01
use_flash_attention: false
dtype: bf16
batch_size: 128
sft_config:
is_reward_model: false
model_name: tlc4418/pythia_1.4b_sft_policy
pretrained_path : models/hf_pythia_1.4b_sft
cache_dir: .cache
quantization: false
seq2seqmodel: false
freeze_layer:
num_layers_unfrozen: 2
residual_dropout: 0.2
use_flash_attention: false
dtype: bf16
batch_size: 32

and met the problem
image

any idea?

@tlc4418
Copy link
Owner

tlc4418 commented Jan 16, 2025

Hey just to close on this issue, indeed I believe this is something that changed in transformers. As Rylan pointed out there is an open issue in trlx, but downgrading your transformers version should probably fix this. The pyproject file in this project should install the correct versions of things, and also trlx sets the transformers requirement to transformers==4.32.0. Also some people have reported some issues with accelerate or deepspeed versions, in which case a pretty extreme downgrade to accelerate==0.22.0 and deepspeed==0.10.1 should definitely work.

As for tsWeen0309's issue, this seems like it has to do with an issue in your alpacafam setup rather than this project, so I would recommend looking at that. It seems maybe you don't have the model downloaded.

@tlc4418 tlc4418 closed this as completed Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants