You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the amazing work! I have encountered some questions while implementing DiffusionLM:
During my experiments, I notice that decoder_nll (CE loss essentially) equals to zero for a period of training (about 8k steps). Then decoder_nll occurs with increasing values. Is this phenomenon normal for the training of DiffusionLM? How will decoder_nll perform is the training is implemented correctly?
The second question is about tT_loss. tT_loss equals to constant value during training (the value is about 1.3e-7). This happens when I try to implement a cosine annealing and warmup upon the training learning rate. However, when I use constant learning rate or linear decay strategy. tT_loss starts decreasing. I am now confused about which curve should be correct for training DiffusionLM. Could you explain a little bit about how the loss curve of tT_loss would occur if DIffusionLM is trained correctly?
Thanks you in advance for paying attention to this issue from your busy schedule. It would do me a big favor if you could help me out with the aforementioned questions.
Best,
The text was updated successfully, but these errors were encountered:
Hi @XiangLi1999 ,
Thanks for the amazing work! I have encountered some questions while implementing DiffusionLM:
decoder_nll
(CE loss essentially) equals to zero for a period of training (about8k
steps). Thendecoder_nll
occurs with increasing values. Is this phenomenon normal for the training of DiffusionLM? How willdecoder_nll
perform is the training is implemented correctly?tT_loss
.tT_loss
equals to constant value during training (the value is about 1.3e-7). This happens when I try to implement a cosine annealing and warmup upon the training learning rate. However, when I use constant learning rate or linear decay strategy.tT_loss
starts decreasing. I am now confused about which curve should be correct for training DiffusionLM. Could you explain a little bit about how the loss curve oftT_loss
would occur if DIffusionLM is trained correctly?Thanks you in advance for paying attention to this issue from your busy schedule. It would do me a big favor if you could help me out with the aforementioned questions.
Best,
The text was updated successfully, but these errors were encountered: