Link for downloading the back translation code is not working #108

sgmoo · 2021-06-19T18:45:23Z

While trying to run back_translate/download.sh, I get the following error:

> bash download.sh

--2021-06-19 12:36:11--  https://storage.googleapis.com/uda_model/text/back_trans_checkpoints.zip 
Resolving storage.googleapis.com (storage.googleapis.com)... 172.217.8.16, 172.217.9.208, 172.217.12.240, ...
Connecting to storage.googleapis.com (storage.googleapis.com)|172.217.8.16|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2021-06-19 12:36:11 ERROR 404: Not Found.
unzip:  cannot find or open back_trans_checkpoints.zip, back_trans_checkpoints.zip.zip or back_trans_checkpoints.zip.ZIP.

It seems that the storage.googleapis.com/uda_model bucket is not valid anymore. Is there an alternate link that I can use to download the back_translate code?

The text was updated successfully, but these errors were encountered:

JosephElHachem · 2021-07-06T13:04:38Z

Hello, I am experiencing the same issue and I hope it will be resolved soon !

sebamenabar · 2021-07-28T20:10:56Z

Hi, I have, the same problem, anybody managed to get the checkpoints?

YuandZhang · 2021-09-15T03:13:39Z

same issue. Have you sovle that problem?

sebamenabar · 2021-09-23T15:28:45Z

Maybe this could be of help, I made a small code to make the backtranslations with HuggingFace, although I have not tested the quality of the generated data, if they perform well with UDA, or the time it would take to translate the whole dataset, but visually they seem good. It works with transformers==4.4.2 and may require some modifications on newer versions.

import torch
from transformers import MarianMTModel, MarianTokenizer

torch.cuda.empty_cache()

en_fr_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
en_fr_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()

fr_en_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
fr_en_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-fr-en").cuda()

src_text = [
    "Hi how are you?",
]

translated_tokens = en_fr_model.generate(
    **{k: v.cuda() for k, v in en_fr_tokenizer(src_text, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_fr = [en_fr_tokenizer.decode(t, skip_special_tokens=True) for t in translated_tokens]

bt_tokens = fr_en_model.generate(
    **{k: v.cuda() for k, v in fr_en_tokenizer(in_fr, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_en = [fr_en_tokenizer.decode(t, skip_special_tokens=True) for t in bt_tokens]

For the arguments used to generate please refer to https://huggingface.co/blog/how-to-generate.

Example of input data and backtranslation:

Input: I lived in Tokyo for 7 months. Knowing the reality of long train commutes, bike rides from the train station, soup stands, and other typical scenes depicted so well, certainly added to my own appreciation for this film which I really, really liked. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. Director Suo's tricks were subtle for the most part; I found his highlighting the character called Tamako Tamura with a soft filter, making her sublime, a tiny bit contrived but most of the directors tricks were so gentle that I was fully pulled in and just danced with his characters. Or cried. Or laughed aloud. Wonderful. A+.
---
Output: I lived in Tokyo for seven months. I know the reality of train rides, bike rides from the train station, soup stands, and other typical scenes shown so nicely, probably added to my own appreciation of this film I really, really loved. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. The pieces of the director Suo have been subtle to most, I found that he highlights the character called Tamaki Tamura with a sweet filter, which makes her sublime, a bit confused but most of the movie-makers' tricks were so soft that I was completely shot in it and just dancing with his characters. Or wept. or laughed aloud. Wonderful. A+.

Liu-Jingyao · 2022-01-23T14:47:33Z

Maybe this could be of help, I made a small code to make the backtranslations with HuggingFace, although I have not tested the quality of the generated data, if they perform well with UDA, or the time it would take to translate the whole dataset, but visually they seem good. It works with transformers==4.4.2 and may require some modifications on newer versions.

import torch
from transformers import MarianMTModel, MarianTokenizer

torch.cuda.empty_cache()

en_fr_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-fr")
en_fr_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-en-fr").cuda()

fr_en_tokenizer = MarianTokenizer.from_pretrained("Helsinki-NLP/opus-mt-fr-en")
fr_en_model = MarianMTModel.from_pretrained("Helsinki-NLP/opus-mt-fr-en").cuda()

src_text = [
    "Hi how are you?",
]

translated_tokens = en_fr_model.generate(
    **{k: v.cuda() for k, v in en_fr_tokenizer(src_text, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_fr = [en_fr_tokenizer.decode(t, skip_special_tokens=True) for t in translated_tokens]

bt_tokens = fr_en_model.generate(
    **{k: v.cuda() for k, v in fr_en_tokenizer(in_fr, return_tensors="pt", padding=True, max_length=512).items()},
    do_sample=True, 
    top_k=10, 
    temperature=2.0,
)
in_en = [fr_en_tokenizer.decode(t, skip_special_tokens=True) for t in bt_tokens]

For the arguments used to generate please refer to https://huggingface.co/blog/how-to-generate.

Example of input data and backtranslation:

Input: I lived in Tokyo for 7 months. Knowing the reality of long train commutes, bike rides from the train station, soup stands, and other typical scenes depicted so well, certainly added to my own appreciation for this film which I really, really liked. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. Director Suo's tricks were subtle for the most part; I found his highlighting the character called Tamako Tamura with a soft filter, making her sublime, a tiny bit contrived but most of the directors tricks were so gentle that I was fully pulled in and just danced with his characters. Or cried. Or laughed aloud. Wonderful. A+.
---
Output: I lived in Tokyo for seven months. I know the reality of train rides, bike rides from the train station, soup stands, and other typical scenes shown so nicely, probably added to my own appreciation of this film I really, really loved. There are aspects of Japanese life in this film painted with vivid colors but you don't have to speak Japanese to enjoy this movie. The pieces of the director Suo have been subtle to most, I found that he highlights the character called Tamaki Tamura with a sweet filter, which makes her sublime, a bit confused but most of the movie-makers' tricks were so soft that I was completely shot in it and just dancing with his characters. Or wept. or laughed aloud. Wonderful. A+.

Thanks! I'll try it as a substitute for the source code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Link for downloading the back translation code is not working #108

Link for downloading the back translation code is not working #108

sgmoo commented Jun 19, 2021 •

edited

Loading

JosephElHachem commented Jul 6, 2021

sebamenabar commented Jul 28, 2021

YuandZhang commented Sep 15, 2021

sebamenabar commented Sep 23, 2021 •

edited

Loading

Liu-Jingyao commented Jan 23, 2022

Link for downloading the back translation code is not working #108

Link for downloading the back translation code is not working #108

Comments

sgmoo commented Jun 19, 2021 • edited Loading

JosephElHachem commented Jul 6, 2021

sebamenabar commented Jul 28, 2021

YuandZhang commented Sep 15, 2021

sebamenabar commented Sep 23, 2021 • edited Loading

Liu-Jingyao commented Jan 23, 2022

sgmoo commented Jun 19, 2021 •

edited

Loading

sebamenabar commented Sep 23, 2021 •

edited

Loading