You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Just as GRPO OOM #475 stated, the vllm kv init is so large that 1 A100 80GB could not hold it, while I have 8*A100 in total.
However, only 1 GPU is allowed to assign to vllm, as vllm_device: auto or ib/python3.10/site-packages/trl/trainer/grpo_trainer.py.
How should I solve the issue? Would anybody know?
The text was updated successfully, but these errors were encountered:
Just as GRPO OOM #475 stated, the vllm kv init is so large that 1 A100 80GB could not hold it, while I have 8*A100 in total.
However, only 1 GPU is allowed to assign to vllm, as
vllm_device: auto
orib/python3.10/site-packages/trl/trainer/grpo_trainer.py
.How should I solve the issue? Would anybody know?
The text was updated successfully, but these errors were encountered: