This repository contains the implementation of the SBRT 2017 paper entitled Towards an end-to-end speech recognizer for Portuguese using deep neural networks.
The model was trained using four datasets: CSLU Spoltech (LDC2006S16), Sid, VoxForge, and LapsBM1.4. Only the CSLU dataset is paid.
You can download the freely available datasets with the provided script (it may take a while):
$ cd data; sh download_datasets.sh
Next, you can preprocess it into an hdf5 file. Click here for more information.
$ python -m extras.make_dataset --parser brsd
You can train the network with the main.py
script. For more usage information see this. To train with the default parameters:
$ python main.py train --dataset .datasets/brsd/data.h5
You may download a pre-trained sbrt2017 over the full brsd dataset (including the CSLU dataset):
$ cd data; sh download_model.sh
Also, you can evaluate the model against the brsd test set
$ python main.py eval --model data/models/sbrt2017.h5 --dataset .datasets/brsd/data.h5
- Python 2.7
- Numpy
- Scipy
- Pyyaml
- HDF5
- Unidecode
- Librosa
- Tensorflow
- Keras
- python_speech_features for the audio preprocessing
- Google Magenta for the hparams
- @robertomest for helping me with everything
- SANTOS, S. C. B.; ALCAIM, A. "Reduced Sets of Subword Units for Continuous Speech Recognition of Portuguese". Electronics Letters, v.36, p.586 588, 2000.
See LICENSE for more information