You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, in all decoder-only models, the "predict" part actually makes no sense during training, as all the points are used for the loss. Note that this is for all decoder-only models such as GPT.
In the code, a context and predict are separately used to support inference for a certain specific prediction length, given a certain fixed length context.
So, in all decoder-only models, the "predict" part actually makes no sense during training, as all the points are used for the loss. Note that this is for all decoder-only models such as GPT.
In the code, a context and predict are separately used to support inference for a certain specific prediction length, given a certain fixed length context.
The loss calculation during training uses the probability density function loss of all targets in context and predict, rather than just predict. Why?
The text was updated successfully, but these errors were encountered: