Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental: Mixture of logistic distributions #5

Merged
merged 21 commits into from
Jan 23, 2018
Merged

Experimental: Mixture of logistic distributions #5

merged 21 commits into from
Jan 23, 2018

Conversation

r9y9
Copy link
Owner

@r9y9 r9y9 commented Jan 13, 2018

  • Add mixture module, which implements discretized_mix_logistic_loss and sample_from_discretized_mix_logistic.
  • mu-law quantization (8bit) and 16-bit linear PCM are now supported.

The code was adapted from https://github.com/openai/pixel-cnn and https://github.com/pclucas14/pixel-cnn-pp.

@r9y9 r9y9 mentioned this pull request Jan 13, 2018
28 tasks
@r9y9
Copy link
Owner Author

r9y9 commented Jan 23, 2018

I haven't got really good speech quality yet, but I will merge this as a an experimental feature.

@r9y9
Copy link
Owner Author

r9y9 commented Jan 23, 2018

step460000.zip

@r9y9 r9y9 changed the title WIP: Mixture of logistic distributions Experimental: Mixture of logistic distributions Jan 23, 2018
@r9y9 r9y9 merged commit 4e1a8ee into master Jan 23, 2018
@r9y9 r9y9 deleted the mixture branch January 23, 2018 07:47
@mfkfge
Copy link

mfkfge commented Jan 23, 2018

Do you still remember how many steps you have trained since you have got the first recognizable speech generated?

@r9y9
Copy link
Owner Author

r9y9 commented Jan 23, 2018

Yeah, around 10k steps I got recognizable speech. After that, I trained over 30k steps.

@npuichigo
Copy link

@r9y9 Can you provide the loss you got after 30k steps?

@r9y9
Copy link
Owner Author

r9y9 commented Jan 24, 2018

@npuichigo Sure, I got the loss value around 56 ~ 57.

@azraelkuan
Copy link
Contributor

@r9y9 Hi, i am converting your code to the ibab's version, but when i do the mixture logisitic verion (the mu-law is ok), Do you get the case that the generated wav only has some little noise but not recognizable speech?
LJ050-0269_gen.wav.zip

below is my mixture sample code

def sample_from_discretized_mix_logistic(y, log_scale_min=-7.0):
    """

    :param y: B x T x C
    :param log_scale_min:
    :return: [-1, 1]
    """
    y_shape = y.get_shape().as_list()

    assert y_shape[2] % 3 == 0
    nr_mix = y_shape[2] // 3

    logit_probs = y[:, :, :nr_mix]

    one_hot = tf.one_hot(tf.argmax(
        logit_probs - tf.log(-tf.log(tf.random_uniform(tf.shape(logit_probs), minval=1e-5, maxval=1. - 1e-5))), 2),
                     depth=nr_mix, dtype=tf.float32)
    means = tf.reduce_sum(y[:, :, nr_mix:nr_mix*2] * one_hot, axis=-1)

    log_scales = tf.maximum(tf.reduce_sum(y[:, :, nr_mix*2:nr_mix*3] * one_hot, axis=-1), log_scale_min)

    u = tf.random_uniform(tf.shape(means), minval=1e-5, maxval=1. - 1e-5)
    x = means + tf.exp(log_scales) * (tf.log(u) - tf.log(1. - u))
    x0 = tf.minimum(tf.maximum(x, -1.), 1.)
    return x0

and i get the x0, the use it to construct the wav directly.
Thanks!

@r9y9
Copy link
Owner Author

r9y9 commented Jan 31, 2018

I never got such a noise. If you are working with tensorflow, then it's probably better to use https://github.com/openai/pixel-cnn/blob/2b03725126c580a07af47c498d456cec17a9735e/pixel_cnn_pp/nn.py, rather than mine.

As I noted in #1 (comment), generated speech tends to be noisy if I use high variance lower bound (e.g., 1e-4) for logistic distributions. log_scale_min=np.log(1e-14) is the best for me for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants