loss calculation #50

sudongwang-upc · 2024-04-18T03:10:41Z

The loss calculation during training uses the probability density function loss of all targets in context and predict, rather than just predict. Why?

ashok-arjun · 2024-04-21T23:40:27Z

So, in all decoder-only models, the "predict" part actually makes no sense during training, as all the points are used for the loss. Note that this is for all decoder-only models such as GPT.

In the code, a context and predict are separately used to support inference for a certain specific prediction length, given a certain fixed length context.

sudongwang-upc · 2024-04-22T03:24:17Z

So, in all decoder-only models, the "predict" part actually makes no sense during training, as all the points are used for the loss. Note that this is for all decoder-only models such as GPT.

In the code, a context and predict are separately used to support inference for a certain specific prediction length, given a certain fixed length context.

Thank you very much for your reply. I got it！

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loss calculation #50

loss calculation #50

sudongwang-upc commented Apr 18, 2024

ashok-arjun commented Apr 21, 2024

sudongwang-upc commented Apr 22, 2024

loss calculation #50

loss calculation #50

Comments

sudongwang-upc commented Apr 18, 2024

ashok-arjun commented Apr 21, 2024

sudongwang-upc commented Apr 22, 2024