The benefit of replacing the cross-entropy loss with a loss function that considers the distance between tokens. #210
ChernovAndrey
started this conversation in
Show and tell
Replies: 1 comment
-
Looks promising, thank you for sharing, @ChernovAndrey! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone,
I would like to share my research paper, where I replaced the cross-entropy loss with the Wasserstein loss to provide the model with information about the distance between tokens. Here is the link: https://arxiv.org/abs/2409.15367
Unfortunately, I do not have the resources to train a model from scratch with the Wasserstein loss. Instead, I fine-tuned a model on zero-shot datasets using both the cross-entropy loss and the Wasserstein loss to validate the idea.
If anyone has the resources to train a model from scratch or ideas on how to improve this approach, I would be happy to hear from you and collaborate.
P.S.
The code is publicly available, so feel free to reuse it: https://github.com/ChernovAndrey/chronos-forecasting-wasserstein
Best regards,
Andrei Chernov
Beta Was this translation helpful? Give feedback.
All reactions