Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MRD vs MS-STFTD #10

Open
Yagelmx opened this issue Sep 3, 2024 · 4 comments
Open

MRD vs MS-STFTD #10

Yagelmx opened this issue Sep 3, 2024 · 4 comments
Labels
fix a minor bug New feature or request

Comments

@Yagelmx
Copy link

Yagelmx commented Sep 3, 2024

Hello! I am in awe at the results you achieved, and I am trying to understand the discriminator setup chosen in your work. I am familiar with the MSD and MPD as presented in HiFiGAN, but i'm having a hard time differentiating between the STFT/spectrogram discriminators.

In the paper, you specify that MRD and MS-STFTD are employed. Indeed in your code, I see Encodec's discriminator and DAC's MRD but changed to only be on magnitude spectrograms instead of complex ones as in the original? Also, I think the original MRD has band splitting which is absent in the new implementation? correct me if I'm mistaken, but why would we do the abs and omit bands? And besides the discrepancy, could you please elaborate what exactly is the key difference between the two? it seems both take spectrograms at multiple scales/resolutions.

@CookiePPP
Copy link

CookiePPP commented Sep 4, 2024

You can see in here the discriminators used for training WavTokenizer are;

  1. a normal MultiPeriodDiscriminator from .decoder.discriminators
  2. a normal single-band-amplitude-only MultiResolutionDiscriminator from .decoder.discriminators
  3. a multi-band-complex MRD from .decoder.discriminator_dac
  4. another normal MPD from .decoder.discriminator_dac which appears to be a duplicate of the MultiPeriodDiscriminator at the top of this list.

MultiScaleSTFTDiscriminator from .encoder.msstftd is not imported or used anywhere in this repo.

the MultiScaleDiscriminator from .decoder.discriminator_dac is not used since .decoder.discriminator_dac.DACDiscriminator().rates list is empty in all configs, so none of that type of discriminator are initialized. Also this MSD class has it's resampling layers disabled so it doesn't work correctly.


PS: I have no relation to this repo and didn't contribute to it in any way. I'm just reading it at the same time as you.

@jishengpeng
Copy link
Owner

jishengpeng commented Sep 5, 2024

You can see in here the discriminators used for training WavTokenizer are;

  1. a normal MultiPeriodDiscriminator from .decoder.discriminators
  2. a normal single-band-amplitude-only MultiResolutionDiscriminator from .decoder.discriminators
  3. a multi-band-complex MRD from .decoder.discriminator_dac
  4. another normal MPD from .decoder.discriminator_dac which appears to be a duplicate of the MultiPeriodDiscriminator at the top of this list.

MultiScaleSTFTDiscriminator from .encoder.msstftd is not imported or used anywhere in this repo.

the MultiScaleDiscriminator from .decoder.discriminator_dac is not used since .decoder.discriminator_dac.DACDiscriminator().rates list is empty in all configs, so none of that type of discriminator are initialized. Also this MSD class has it's resampling layers disabled so it doesn't work correctly.

PS: I have no relation to this repo and didn't contribute to it in any way. I'm just reading it at the same time as you.

There are minor bugs present; our intention was to replicate the discriminator from the DAC, but we inadvertently copied a different one. While the majority of @CookiePPP's points are accurate, it's important to note that another MPD operates on multiple scales and returns fmap instead of x. Thus, it differs from the first MPD. Additionally, the ablation experiments mentioned in the paper are correct. Modifying lines 185 and 186 in the experiments file will aid in reconstruction.

@jishengpeng jishengpeng added the fix a minor bug New feature or request label Sep 5, 2024
@CookiePPP
Copy link

CookiePPP commented Sep 5, 2024

Modifying lines 185 and 186 in the experiments file will aid in reconstruction.

disc_params = [
{"params": self.multiperioddisc.parameters()},
{"params": self.multiresddisc.parameters()},
]

I don't think the dac discriminator weights are being updated by the way, not sure if you're using pretrained weights or this is another 'minor bug' but the parameters aren't being given to the optimizer.

(self.dac.parameters() is not being given to the optimizer however I think the param.requires_grad is still set to true for those parameters, so it's not clear to me if this is intentional.)

@jishengpeng
Copy link
Owner

Modifying lines 185 and 186 in the experiments file will aid in reconstruction.

disc_params = [
{"params": self.multiperioddisc.parameters()},
{"params": self.multiresddisc.parameters()},
]

I don't think the dac discriminator weights are being updated by the way, not sure if you're using pretrained weights or this is another 'minor bug' but the parameters aren't being given to the optimizer.

(self.dac.parameters() is not being given to the optimizer however I think the param.requires_grad is still set to true for those parameters, so it's not clear to me if this is intentional.)

It is noteworthy that multiple versions of the wavtokenizer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fix a minor bug New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants