The benefit of replacing the cross-entropy loss with a loss function that considers the distance between tokens. #210

ChernovAndrey · 2024-11-25T11:56:04Z

ChernovAndrey
Nov 25, 2024

Hello everyone,

I would like to share my research paper, where I replaced the cross-entropy loss with the Wasserstein loss to provide the model with information about the distance between tokens. Here is the link: https://arxiv.org/abs/2409.15367

Unfortunately, I do not have the resources to train a model from scratch with the Wasserstein loss. Instead, I fine-tuned a model on zero-shot datasets using both the cross-entropy loss and the Wasserstein loss to validate the idea.

If anyone has the resources to train a model from scratch or ideas on how to improve this approach, I would be happy to hear from you and collaborate.

P.S.
The code is publicly available, so feel free to reuse it: https://github.com/ChernovAndrey/chronos-forecasting-wasserstein

Best regards,
Andrei Chernov

abdulfatir · 2024-11-25T14:26:42Z

abdulfatir
Nov 25, 2024
Maintainer

Looks promising, thank you for sharing, @ChernovAndrey!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The benefit of replacing the cross-entropy loss with a loss function that considers the distance between tokens. #210

{{title}}

Replies: 1 comment

{{title}}

Select a reply

The benefit of replacing the cross-entropy loss with a loss function that considers the distance between tokens. #210

ChernovAndrey Nov 25, 2024

Replies: 1 comment

abdulfatir Nov 25, 2024 Maintainer

ChernovAndrey
Nov 25, 2024

abdulfatir
Nov 25, 2024
Maintainer