Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MT parallel dataset to mbaza-nlp hugginface platform #2

Open
2 of 3 tasks
rutsam opened this issue Oct 22, 2022 · 3 comments
Open
2 of 3 tasks

Add MT parallel dataset to mbaza-nlp hugginface platform #2

rutsam opened this issue Oct 22, 2022 · 3 comments
Assignees

Comments

@rutsam
Copy link

rutsam commented Oct 22, 2022

Goal: add the dataset from the machine translation dataset to the mbaza-nlp huggingface platform
Definition of done

  • Add the dataset done by Arnaud
  • Add the dataset done by Rene
  • Add the dataset done by Kefas
@rutsam
Copy link
Author

rutsam commented Oct 22, 2022

@rutsam will upload the dataset

@rutsam
Copy link
Author

rutsam commented Nov 5, 2022

Only uploaded dataset for Arnaud and Kefas since it is more accurate, Rene dataset is not clear, but @IMdtman will request Rene to scrape more data 500 articles of Wikipedia

@rutsam
Copy link
Author

rutsam commented Nov 12, 2022

@agent87 will support @renepromesse on how to clean the dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants