Skip to content

Codebase for multilingual neural machine translation

License

Notifications You must be signed in to change notification settings

ZhaoCinyu/nmt-multi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  1. Download raw data
wget https://object.pouta.csc.fi/OPUS-100/v1.0/opus-100-corpus-v1.0.tar.gz && tar -xzf opus-100-corpus-v1.0.tar.gz
  1. Build environment
cd nmt-multi/fairseq_dir
pip install -e ./

git clone https://github.com/google/sentencepiece.git 
cd sentencepiece
mkdir build
cd build
cmake ..
make -j $(nproc)
make install
ldconfig -v
  1. Run preprocessing

Edit scripts/opus-100/data_process/multilingual_preprocess.sh, replace /path/to/ with local directory

run preprocessing script multilingual_preprocess.sh, output preprocessed data to nmt-multi/data/

About

Codebase for multilingual neural machine translation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 88.4%
  • Shell 9.9%
  • Cuda 0.9%
  • C++ 0.4%
  • Cython 0.3%
  • Lua 0.1%