Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about the NLL loss #66

Open
AlonzoLeeeooo opened this issue Jun 20, 2023 · 0 comments
Open

Questions about the NLL loss #66

AlonzoLeeeooo opened this issue Jun 20, 2023 · 0 comments

Comments

@AlonzoLeeeooo
Copy link

Hi @XiangLi1999 ,

Thanks for the amazing work! I have encountered some questions while implementing DiffusionLM:

  1. During my experiments, I notice that decoder_nll (CE loss essentially) equals to zero for a period of training (about 8k steps). Then decoder_nll occurs with increasing values. Is this phenomenon normal for the training of DiffusionLM? How will decoder_nll perform is the training is implemented correctly?
  2. The second question is about tT_loss. tT_loss equals to constant value during training (the value is about 1.3e-7). This happens when I try to implement a cosine annealing and warmup upon the training learning rate. However, when I use constant learning rate or linear decay strategy. tT_loss starts decreasing. I am now confused about which curve should be correct for training DiffusionLM. Could you explain a little bit about how the loss curve of tT_loss would occur if DIffusionLM is trained correctly?

Thanks you in advance for paying attention to this issue from your busy schedule. It would do me a big favor if you could help me out with the aforementioned questions.

Best,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant