-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Experimental: Mixture of logistic distributions #5
Conversation
support mulaw (not quantized) as well as raw and mulaw-quantize
This reverts commit db60410.
disable scalar input for now
I haven't got really good speech quality yet, but I will merge this as a an experimental feature. |
Do you still remember how many steps you have trained since you have got the first recognizable speech generated? |
Yeah, around 10k steps I got recognizable speech. After that, I trained over 30k steps. |
@r9y9 Can you provide the loss you got after 30k steps? |
@npuichigo Sure, I got the loss value around 56 ~ 57. |
@r9y9 Hi, i am converting your code to the ibab's version, but when i do the mixture logisitic verion (the mu-law is ok), Do you get the case that the generated wav only has some little noise but not recognizable speech? below is my mixture sample code
and i get the |
I never got such a noise. If you are working with tensorflow, then it's probably better to use https://github.com/openai/pixel-cnn/blob/2b03725126c580a07af47c498d456cec17a9735e/pixel_cnn_pp/nn.py, rather than mine. As I noted in #1 (comment), generated speech tends to be noisy if I use high variance lower bound (e.g., 1e-4) for logistic distributions. |
mixture
module, which implementsdiscretized_mix_logistic_loss
andsample_from_discretized_mix_logistic
.The code was adapted from https://github.com/openai/pixel-cnn and https://github.com/pclucas14/pixel-cnn-pp.