Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPO OOM #475

Open
samma1570 opened this issue Mar 5, 2025 · 7 comments
Open

GRPO OOM #475

samma1570 opened this issue Mar 5, 2025 · 7 comments

Comments

@samma1570
Copy link

config

Model arguments

model_name_or_path: "/ossfs/workspace/Logic-RL/Qwen2.5-7B-Instruct"
model_revision: main
torch_dtype: bfloat16
attn_implementation: flash_attention_2

Data training arguments

dataset_name: DigitalLearningGmbH/MATH-lighteval
dataset_config: default
system_prompt: "You are a helpful AI Assistant, designed to provided well-reasoned and detailed responses. You FIRST think about the reasoning process as an internal monologue and then provide the user with the answer. The reasoning process MUST BE enclosed within and tags."

GRPO trainer config

bf16: true
use_vllm: true
vllm_device: auto
vllm_gpu_memory_utilization: 0.7
do_eval: true
eval_strategy: steps
eval_steps: 100
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: false
hub_model_id: Qwen-2.5-7B-Simple-RL
hub_strategy: every_save
learning_rate: 3.0e-06
log_completions: true
log_level: info
logging_first_step: true
logging_steps: 5
logging_strategy: steps
lr_scheduler_type: cosine
max_prompt_length: 512
max_completion_length: 1024
max_steps: -1
num_generations: 3
num_train_epochs: 1
output_dir: data/Qwen-2.5-7B-Simple-RL
overwrite_output_dir: true
per_device_eval_batch_size: 1
per_device_train_batch_size: 1
push_to_hub: false
report_to:

  • wandb
    reward_funcs:
  • accuracy
  • format
    reward_weights:
  • 1.0
  • 1.0
    save_strategy: "no"
    seed: 42
    warmup_ratio: 0.1

script:
ACCELERATE_LOG_LEVEL=info accelerate launch --config_file recipes/accelerate_configs/zero2.yaml
--num_processes=3 src/open_r1/grpo.py
--config recipes/Qwen2.5-7B-Instruct/grpo/config_simple_rl.yaml

error

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 9.46 GiB. GPU 0 has a total capacity of 79.35 GiB of which 1.41 GiB is free. Process 29996 has 77.92 GiB memory in use. Of the allocated memory 71.05 GiB is allocated by PyTorch, and 4.77 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting

@samma1570
Copy link
Author

4* A100, cuda 12.1

@KennyShang
Copy link

Try to use deepspeed zero3: recipes/accelerate_configs/zero3.yaml

@crownwang13
Copy link

I ran into a similar issue with almost the same model and environment. I tried changing mbs, max_completion_length, vllm_gpu_memory_utilization ,and set deepspeed zero3,but it still didn’t solve the problem. It looks like the main issue is that vLLM's KV cache initialization takes up too much memory. After switching to a smaller model—R1-distilled Qwen-1.5B—I was able to train without any problems.

@qgallouedec
Copy link
Member

Can you provide the full traceback? The solution depends on when the OOM occurs

@samma1570
Copy link
Author

real ,it work

@mrb957600057
Copy link

real ,it work

how to handle this problem

@greatxue
Copy link

Anybody know how to handle it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants