WaveNet implementation in Keras

Based on https://deepmind.com/blog/wavenet-generative-model-raw-audio/ and https://arxiv.org/pdf/1609.03499.pdf. Forked from https://github.com/basveeling/wavenet on 14/08/2017

Please see description at the original repo

Changes:

Adjusted to work with Python3
Theano backend extended with Tensorflow
Added Nadam optimiser
Fixed import of wav file
Added a workaround to prevent os.listdir to catch ".DS_Store" as a dir name on Mac OS. It seems to be a python's bug.

Options:

Train with different configurations: $ KERAS_BACKEND=tensorflow python3 wavenet.py with 'option=value' 'option2=value' Available options:

  batch_size = 16
  data_dir = 'data'
  data_dir_structure = 'flat'
  debug = False
  desired_sample_rate = 4410
  dilation_depth = 9
  early_stopping_patience = 20
  fragment_length = 1152
  fragment_stride = 128
  keras_verbose = 1
  learn_all_outputs = True
  nb_epoch = 1000
  nb_filters = 256
  nb_output_bins = 256
  nb_stacks = 1
  predict_initial_input = ''
  predict_seconds = 1
  predict_use_softmax_as_input = False
  random_train_batches = False
  randomize_batch_order = True
  run_dir = None
  sample_argmax = False
  sample_temperature = 1
  seed = 173213366
  test_factor = 0.1
  train_only_in_receptive_field = True
  use_bias = False
  use_skip_connections = True
  use_ulaw = True
  optimizer:
    decay = 0.0
    epsilon = None
    lr = 0.001
    momentum = 0.9
    nesterov = True
    optimizer = 'sgd'

Using your own training data:

Create a new data directory with a train and test folder in it. All wave files in these folders will be used as data.
- Caveat: Make sure your wav files are supported by scipy.io.wavefile.read(): e.g. don't use 24bit wav and remove meta info.
Run with: $ python3 wavenet.py with 'data_dir=your_data_dir_name'
Test preprocessing results with: $ python3 wavenet.py test_preprocess with 'data_dir=your_data_dir_name'

Uncertainties from paper:

It's unclear if the model is trained to predict t+1 samples for every input sample, or only for the outputs for which which $t-receptive_field$ was in the input. Right now the code does the latter.
There is no mention of weight decay, batch normalization in the paper. Perhaps this is not needed given enough data?

Note on computational cost:

The Wavenet model is quite expensive to train and sample from. We can however trade computation cost with accuracy and fidility by lowering the sampling rate, amount of stacks and the amount of channels per layer.

For a downsampled model 4000 sampling rate, 256 filters, 1 stacks:

GTX 1080 Ti needs around 50 sec to generate one second of audio.

Kirill Makukhin

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
vctk		vctk
.gitignore		.gitignore
README.md		README.md
cheatsheet.txt		cheatsheet.txt
convtest.py		convtest.py
dataset.py		dataset.py
requirements.txt		requirements.txt
wavenet.py		wavenet.py
wavenet_utils.py		wavenet_utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WaveNet implementation in Keras

Options:

Using your own training data:

Uncertainties from paper:

Note on computational cost:

About

Releases

Packages

Contributors 4

Languages

aupilot/wavenet

Folders and files

Latest commit

History

Repository files navigation

WaveNet implementation in Keras

Options:

Using your own training data:

Uncertainties from paper:

Note on computational cost:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages