-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link for downloading the back translation code is not working #108
Comments
Hello, I am experiencing the same issue and I hope it will be resolved soon ! |
Hi, I have, the same problem, anybody managed to get the checkpoints? |
same issue. Have you sovle that problem? |
Maybe this could be of help, I made a small code to make the backtranslations with HuggingFace, although I have not tested the quality of the generated data, if they perform well with UDA, or the time it would take to translate the whole dataset, but visually they seem good. It works with import torch
from transformers import MarianMTModel, MarianTokenizer
torch.cuda.empty_cache()
en_fr_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
en_fr_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()
fr_en_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
fr_en_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-fr-en").cuda()
src_text = [
"Hi how are you?",
]
translated_tokens = en_fr_model.generate(
**{k: v.cuda() for k, v in en_fr_tokenizer(src_text, return_tensors="pt", padding=True, max_length=512).items()},
do_sample=True,
top_k=10,
temperature=2.0,
)
in_fr = [en_fr_tokenizer.decode(t, skip_special_tokens=True) for t in translated_tokens]
bt_tokens = fr_en_model.generate(
**{k: v.cuda() for k, v in fr_en_tokenizer(in_fr, return_tensors="pt", padding=True, max_length=512).items()},
do_sample=True,
top_k=10,
temperature=2.0,
)
in_en = [fr_en_tokenizer.decode(t, skip_special_tokens=True) for t in bt_tokens] For the arguments used to generate please refer to https://huggingface.co/blog/how-to-generate. Example of input data and backtranslation:
|
Thanks! I'll try it as a substitute for the source code. |
While trying to run back_translate/download.sh, I get the following error:
It seems that the storage.googleapis.com/uda_model bucket is not valid anymore. Is there an alternate link that I can use to download the back_translate code?
The text was updated successfully, but these errors were encountered: