TensorFlowASR ⚡

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

TensorFlowASR implements some automatic speech recognition architectures such as DeepSpeech2, Jasper, RNN Transducer, ContextNet, Conformer, etc. These models can be converted to TFLite to reduce memory and computation for deployment 😄

What's New?

(12/17/2020) Supported ContextNet http://arxiv.org/abs/2005.03191
(12/12/2020) Add support for using masking
(11/14/2020) Supported Gradient Accumulation for Training in Larger Batch Size
(11/3/2020) Reduce differences between librosa.stft and tf.signal.stft
(10/31/2020) Update DeepSpeech2 and Supported Jasper https://arxiv.org/abs/1904.03288
(10/18/2020) Supported Streaming Transducer https://arxiv.org/abs/1811.06621

😋 Supported Models

Baselines

CTCModel (End2end models using CTC Loss for training)
Transducer Models (End2end models using RNNT Loss for training)

Publications

Deep Speech 2 (Reference: https://arxiv.org/abs/1512.02595) See examples/deepspeech2
Jasper (Reference: https://arxiv.org/abs/1904.03288) See examples/jasper
Conformer Transducer (Reference: https://arxiv.org/abs/2005.08100) See examples/conformer
Streaming Transducer (Reference: https://arxiv.org/abs/1811.06621) See examples/streaming_transducer
ContextNet (Reference: http://arxiv.org/abs/2005.03191) See examples/contextnet

Installation

Install tensorflow>=2.3.0 or tf-nightly.

For training and testing, you should use git clone for installing necessary packages from other authors (ctc_decoders, rnnt_loss, etc.)

Installing via PyPi

Run pip3 install -U TensorFlowASR

Installing from source

git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install .

For anaconda3:

conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip install .

Setup training and testing

For datasets, see datasets
For training, testing and using CTC Models, run ./scripts/install_ctc_decoders.sh
For training Transducer Models, run export CUDA_HOME=/usr/local/cuda && ./scripts/install_rnnt_loss.sh (Note: only export CUDA_HOME when you have CUDA)
For mixed precision training, use flag --mxp when running python scripts from examples
For enabling XLA, run TF_XLA_FLAGS=--tf_xla_auto_jit=2 python3 $path_to_py_script)
For hiding warnings, run export TF_CPP_MIN_LOG_LEVEL=2 before running any examples

TFLite Convertion

After converting to tflite, the tflite model is like a function that transforms directly from an audio signal to unicode code points, then we can convert unicode points to string.

Install tf-nightly using pip install tf-nightly
Build a model with the same architecture as the trained model (if model has tflite argument, you must set it to True), then load the weights from trained model to the built model
Load TFSpeechFeaturizer and TextFeaturizer to model using function add_featurizers
Convert model's function to tflite as follows:

func = model.make_tflite_function(greedy=True) # or False
concrete_func = func.get_concrete_function()
converter = tf.lite.TFLiteConverter.from_concrete_functions([concrete_func])
converter.experimental_new_converter = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
                                       tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()

Save the converted tflite model as follows:

if not os.path.exists(os.path.dirname(tflite_path)):
    os.makedirs(os.path.dirname(tflite_path))
with open(tflite_path, "wb") as tflite_out:
    tflite_out.write(tflite_model)

Then the .tflite model is ready to be deployed

Features Extraction

See features_extraction

Augmentations

See augmentations

Training & Testing

Example YAML Config Structure

speech_config: ...
model_config: ...
decoder_config: ...
learning_config:
  augmentations: ...
  dataset_config:
    train_paths: ...
    eval_paths: ...
    test_paths: ...
    tfrecords_dir: ...
  optimizer_config: ...
  running_config:
    batch_size: 8
    num_epochs: 20
    outdir: ...
    log_interval_steps: 500

See examples for some predefined ASR models and results

Corpus Sources and Pretrained Models

For pretrained models, go to drive

English

Name	Source	Hours
LibriSpeech	LibriSpeech	970h
Common Voice	https://commonvoice.mozilla.org	1932h

Vietnamese

Name	Source	Hours
Vivos	https://ailab.hcmus.edu.vn/vivos	15h
InfoRe Technology 1	InfoRe1 (passwd: BroughtToYouByInfoRe)	25h
InfoRe Technology 2 (used in VLSP2019)	InfoRe2 (passwd: BroughtToYouByInfoRe)	415h

German

Name	Source	Hours
Common Voice	https://commonvoice.mozilla.org/	750h

References & Credits

Contact

Huy Le Nguyen

Email: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 254 Commits
.github/workflows		.github/workflows
examples		examples
scripts		scripts
tensorflow_asr		tensorflow_asr
tests		tests
vocabularies		vocabularies
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TensorFlowASR ⚡

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

What's New?

Table of Contents

😋 Supported Models

Baselines

Publications

Installation

Installing via PyPi

Installing from source

Setup training and testing

TFLite Convertion

Features Extraction

Augmentations

Training & Testing

Corpus Sources and Pretrained Models

English

Vietnamese

German

References & Credits

Contact

About

Releases

Packages

Languages

License

nichongjia-2007/TensorFlowASR

Folders and files

Latest commit

History

Repository files navigation

TensorFlowASR ⚡

Almost State-of-the-art Automatic Speech Recognition in Tensorflow 2

What's New?

Table of Contents

😋 Supported Models

Baselines

Publications

Installation

Installing via PyPi

Installing from source

Setup training and testing

TFLite Convertion

Features Extraction

Augmentations

Training & Testing

Corpus Sources and Pretrained Models

English

Vietnamese

German

References & Credits

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages