-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRD vs MS-STFTD #10
Comments
You can see in here the discriminators used for training WavTokenizer are;
MultiScaleSTFTDiscriminator from the MultiScaleDiscriminator from PS: I have no relation to this repo and didn't contribute to it in any way. I'm just reading it at the same time as you. |
There are minor bugs present; our intention was to replicate the discriminator from the DAC, but we inadvertently copied a different one. While the majority of @CookiePPP's points are accurate, it's important to note that another MPD operates on multiple scales and returns fmap instead of x. Thus, it differs from the first MPD. Additionally, the ablation experiments mentioned in the paper are correct. Modifying lines 185 and 186 in the experiments file will aid in reconstruction. |
WavTokenizer/decoder/experiment.py Lines 87 to 90 in e435b40
I don't think the dac discriminator weights are being updated by the way, not sure if you're using pretrained weights or this is another 'minor bug' but the parameters aren't being given to the optimizer. ( |
It is noteworthy that multiple versions of the wavtokenizer. |
Hello! I am in awe at the results you achieved, and I am trying to understand the discriminator setup chosen in your work. I am familiar with the MSD and MPD as presented in HiFiGAN, but i'm having a hard time differentiating between the STFT/spectrogram discriminators.
In the paper, you specify that MRD and MS-STFTD are employed. Indeed in your code, I see Encodec's discriminator and DAC's MRD but changed to only be on magnitude spectrograms instead of complex ones as in the original? Also, I think the original MRD has band splitting which is absent in the new implementation? correct me if I'm mistaken, but why would we do the
abs
and omit bands? And besides the discrepancy, could you please elaborate what exactly is the key difference between the two? it seems both take spectrograms at multiple scales/resolutions.The text was updated successfully, but these errors were encountered: