Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare tokenizer of LibriSpeech by using sentencepiece #227

Open
tuandattt opened this issue Aug 18, 2024 · 1 comment
Open

Prepare tokenizer of LibriSpeech by using sentencepiece #227

tuandattt opened this issue Aug 18, 2024 · 1 comment

Comments

@tuandattt
Copy link

❓ Questions & Help

HI, I can not save model and vocab when using spm.SentencePieceTrainer.Train.

Details

This is my config:
python3 ./openspeech_cli/hydra_train.py dataset=librispeech dataset.dataset_download=False dataset.dataset_path=/home/stud_dat/openspeech/openspeech/datasets/librispeech dataset.manifest_file_path=$MANIFEST_FILE_PATH tokenizer=libri_subword model=conformer_lstm audio=fbank lr_scheduler=warmup_reduce_lr_on_plateau trainer=gpu criterion=cross_entropy

@tuandattt
Copy link
Author

And this is the error:
trainer_interface.cc(605) LOG(INFO) Saving model: sp.model
trainer_interface.cc(616) LOG(INFO) Saving vocabs: sp.vocab
Error executing job with overrides: ['dataset=librispeech', 'dataset.dataset_download=False', 'dataset.dataset_path=/home/stud_dat/openspeech/openspeech/datasets/librispeech', 'dataset.manifest_file_path=', 'tokenizer=libri_subword', 'model=conformer_lstm', 'audio=fbank', 'lr_scheduler=warmup_reduce_lr_on_plateau', 'trainer=gpu', 'criterion=cross_entropy']

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant