Questions about the NLL loss #66

AlonzoLeeeooo · 2023-06-20T15:14:53Z

Hi @XiangLi1999 ,

Thanks for the amazing work! I have encountered some questions while implementing DiffusionLM:

During my experiments, I notice that decoder_nll (CE loss essentially) equals to zero for a period of training (about 8k steps). Then decoder_nll occurs with increasing values. Is this phenomenon normal for the training of DiffusionLM? How will decoder_nll perform is the training is implemented correctly?
The second question is about tT_loss. tT_loss equals to constant value during training (the value is about 1.3e-7). This happens when I try to implement a cosine annealing and warmup upon the training learning rate. However, when I use constant learning rate or linear decay strategy. tT_loss starts decreasing. I am now confused about which curve should be correct for training DiffusionLM. Could you explain a little bit about how the loss curve of tT_loss would occur if DIffusionLM is trained correctly?

Thanks you in advance for paying attention to this issue from your busy schedule. It would do me a big favor if you could help me out with the aforementioned questions.

Best,

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about the NLL loss #66

Questions about the NLL loss #66

AlonzoLeeeooo commented Jun 20, 2023

Questions about the NLL loss #66

Questions about the NLL loss #66

Comments

AlonzoLeeeooo commented Jun 20, 2023