You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
, contiguous_gradients is deepspeed memory optimization, which is default True. I am very curious why is it set False in CogvideoX sft procedure? And we accidentally discovered that when it is set to True, the loss will become abnormally large when training on more than 128 GPUs; so what is your motivation for disabling it? Could you please share it?
For the code:
CogVideo/sat/configs/sft.yaml
Line 39 in 2fdc59c
Reference:
(according to https://www.deepspeed.ai/docs/config-json/#bfloat16-training-options)
The text was updated successfully, but these errors were encountered: