Experimental: Mixture of logistic distributions #5

r9y9 · 2018-01-13T17:46:21Z

Add mixture module, which implements discretized_mix_logistic_loss and sample_from_discretized_mix_logistic.
mu-law quantization (8bit) and 16-bit linear PCM are now supported.

The code was adapted from https://github.com/openai/pixel-cnn and https://github.com/pclucas14/pixel-cnn-pp.

support mulaw (not quantized) as well as raw and mulaw-quantize

This reverts commit db60410.

disable scalar input for now

r9y9 · 2018-01-23T07:23:08Z

I haven't got really good speech quality yet, but I will merge this as a an experimental feature.

r9y9 · 2018-01-23T07:24:30Z

step460000.zip

mfkfge · 2018-01-23T11:43:27Z

Do you still remember how many steps you have trained since you have got the first recognizable speech generated?

r9y9 · 2018-01-23T11:57:52Z

Yeah, around 10k steps I got recognizable speech. After that, I trained over 30k steps.

npuichigo · 2018-01-24T06:20:36Z

@r9y9 Can you provide the loss you got after 30k steps?

r9y9 · 2018-01-24T07:45:39Z

@npuichigo Sure, I got the loss value around 56 ~ 57.

azraelkuan · 2018-01-31T12:17:11Z

@r9y9 Hi, i am converting your code to the ibab's version, but when i do the mixture logisitic verion (the mu-law is ok), Do you get the case that the generated wav only has some little noise but not recognizable speech?
LJ050-0269_gen.wav.zip

below is my mixture sample code

def sample_from_discretized_mix_logistic(y, log_scale_min=-7.0):
    """

    :param y: B x T x C
    :param log_scale_min:
    :return: [-1, 1]
    """
    y_shape = y.get_shape().as_list()

    assert y_shape[2] % 3 == 0
    nr_mix = y_shape[2] // 3

    logit_probs = y[:, :, :nr_mix]

    one_hot = tf.one_hot(tf.argmax(
        logit_probs - tf.log(-tf.log(tf.random_uniform(tf.shape(logit_probs), minval=1e-5, maxval=1. - 1e-5))), 2),
                     depth=nr_mix, dtype=tf.float32)
    means = tf.reduce_sum(y[:, :, nr_mix:nr_mix*2] * one_hot, axis=-1)

    log_scales = tf.maximum(tf.reduce_sum(y[:, :, nr_mix*2:nr_mix*3] * one_hot, axis=-1), log_scale_min)

    u = tf.random_uniform(tf.shape(means), minval=1e-5, maxval=1. - 1e-5)
    x = means + tf.exp(log_scales) * (tf.log(u) - tf.log(1. - u))
    x0 = tf.minimum(tf.maximum(x, -1.), 1.)
    return x0

and i get the x0, the use it to construct the wav directly.
Thanks!

r9y9 · 2018-01-31T12:45:48Z

I never got such a noise. If you are working with tensorflow, then it's probably better to use https://github.com/openai/pixel-cnn/blob/2b03725126c580a07af47c498d456cec17a9735e/pixel_cnn_pp/nn.py, rather than mine.

As I noted in #1 (comment), generated speech tends to be noisy if I use high variance lower bound (e.g., 1e-4) for logistic distributions. log_scale_min=np.log(1e-14) is the best for me for now.

r9y9 added 3 commits January 13, 2018 17:53

Quantize channels

59a49df

out channels

42195e8

WIP: Mixture of logistic distributions

8ac97b8

r9y9 mentioned this pull request Jan 13, 2018

Planned TODOs #1

Closed

28 tasks

r9y9 added 18 commits January 14, 2018 14:57

Fix for synthesis/evaluation

6141015

try to avoid dropout on first input

db60410

fix

74b1616

more flexible input_type

87af7e5

support mulaw (not quantized) as well as raw and mulaw-quantize

Fix typ

4e8635e

Fix missing denormalization

48a9615

restore

e50c7cf

Revert "try to avoid dropout on first input"

0eeb108

This reverts commit db60410.

try 10 mixture again

2e232cc

ExponentialMovingAverage

ea987ff

Fix clone

510f5f1

try this for ljspeech

748901e

notice

cf30be7

weight normalized transposed convolution

98abd0c

thinking about feature extraction again. should be improved

a4c5991

try this

ce9921a

Use ReLU for sampling

1e356ff

Prepare for merge

61370e1

disable scalar input for now

r9y9 changed the title ~~WIP: Mixture of logistic distributions~~ Experimental: Mixture of logistic distributions Jan 23, 2018

r9y9 merged commit 4e1a8ee into master Jan 23, 2018

r9y9 deleted the mixture branch January 23, 2018 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experimental: Mixture of logistic distributions #5

Experimental: Mixture of logistic distributions #5

r9y9 commented Jan 13, 2018

r9y9 commented Jan 23, 2018

r9y9 commented Jan 23, 2018

mfkfge commented Jan 23, 2018

r9y9 commented Jan 23, 2018

npuichigo commented Jan 24, 2018

r9y9 commented Jan 24, 2018

azraelkuan commented Jan 31, 2018

r9y9 commented Jan 31, 2018

Experimental: Mixture of logistic distributions #5

Experimental: Mixture of logistic distributions #5

Conversation

r9y9 commented Jan 13, 2018

r9y9 commented Jan 23, 2018

r9y9 commented Jan 23, 2018

mfkfge commented Jan 23, 2018

r9y9 commented Jan 23, 2018

npuichigo commented Jan 24, 2018

r9y9 commented Jan 24, 2018

azraelkuan commented Jan 31, 2018

r9y9 commented Jan 31, 2018