-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why not output samples from student to teacher? #12
Comments
i think target_distribution should be compute by samples from student as output samples from student to teacher. |
I don't really get what you mean. Do you think we should sample many inputs for the teacher network from the mu and s output of the student network? Then we had to calculate the whole teacher network multiple times which would be very computationally expensive. Also, if we sample the student output we lose the conditioning on the previous time-samples, so I don't think it makes sense. The output of mu and s of the student network exists only to compare the distributions of the student and the teacher network. |
I got your idea, i have same worry, but paper said we need to estimate the distributions of teacher and student by sampling. By the way, the target_distribution and student_samples has different shape, is a bug ? Have u got any reasonable results? |
In the paper it says that x = g(z), where z is the input noise. It think the whole point of equations (9)-(13) in the paper is to save us from having to calculate the teacher network multiple times. |
Great! I think the power loss and contrastiveis loss is very important for good quality voice. |
Can u show me your loss plot?My loss can't get coveraged for days. |
pytorch-wavenet/wavenet_training.py
Line 259 in 0913964
The text was updated successfully, but these errors were encountered: