MRD vs MS-STFTD #10

Yagelmx · 2024-09-03T17:00:05Z

Hello! I am in awe at the results you achieved, and I am trying to understand the discriminator setup chosen in your work. I am familiar with the MSD and MPD as presented in HiFiGAN, but i'm having a hard time differentiating between the STFT/spectrogram discriminators.

In the paper, you specify that MRD and MS-STFTD are employed. Indeed in your code, I see Encodec's discriminator and DAC's MRD but changed to only be on magnitude spectrograms instead of complex ones as in the original? Also, I think the original MRD has band splitting which is absent in the new implementation? correct me if I'm mistaken, but why would we do the abs and omit bands? And besides the discrepancy, could you please elaborate what exactly is the key difference between the two? it seems both take spectrograms at multiple scales/resolutions.

The text was updated successfully, but these errors were encountered:

CookiePPP · 2024-09-04T13:17:29Z

You can see in here the discriminators used for training WavTokenizer are;

a normal MultiPeriodDiscriminator from .decoder.discriminators
a normal single-band-amplitude-only MultiResolutionDiscriminator from .decoder.discriminators
a multi-band-complex MRD from .decoder.discriminator_dac
another normal MPD from .decoder.discriminator_dac ~~which appears to be a duplicate of the MultiPeriodDiscriminator at the top of this list.~~

MultiScaleSTFTDiscriminator from .encoder.msstftd is not imported or used anywhere in this repo.

the MultiScaleDiscriminator from .decoder.discriminator_dac is not used since .decoder.discriminator_dac.DACDiscriminator().rates list is empty in all configs, so none of that type of discriminator are initialized. Also this MSD class has it's resampling layers disabled so it doesn't work correctly.

PS: I have no relation to this repo and didn't contribute to it in any way. I'm just reading it at the same time as you.

jishengpeng · 2024-09-05T07:39:40Z

You can see in here the discriminators used for training WavTokenizer are;

a normal MultiPeriodDiscriminator from .decoder.discriminators

a normal single-band-amplitude-only MultiResolutionDiscriminator from .decoder.discriminators

a multi-band-complex MRD from .decoder.discriminator_dac

another normal MPD from .decoder.discriminator_dac which appears to be a duplicate of the MultiPeriodDiscriminator at the top of this list.

MultiScaleSTFTDiscriminator from .encoder.msstftd is not imported or used anywhere in this repo.

the MultiScaleDiscriminator from .decoder.discriminator_dac is not used since .decoder.discriminator_dac.DACDiscriminator().rates list is empty in all configs, so none of that type of discriminator are initialized. Also this MSD class has it's resampling layers disabled so it doesn't work correctly.

PS: I have no relation to this repo and didn't contribute to it in any way. I'm just reading it at the same time as you.

There are minor bugs present; our intention was to replicate the discriminator from the DAC, but we inadvertently copied a different one. While the majority of @CookiePPP's points are accurate, it's important to note that another MPD operates on multiple scales and returns fmap instead of x. Thus, it differs from the first MPD. Additionally, the ablation experiments mentioned in the paper are correct. Modifying lines 185 and 186 in the experiments file will aid in reconstruction.

CookiePPP · 2024-09-05T15:27:49Z

Modifying lines 185 and 186 in the experiments file will aid in reconstruction.

WavTokenizer/decoder/experiment.py

Lines 87 to 90 in e435b40

    
           disc_params = [ 
        
               {"params": self.multiperioddisc.parameters()}, 
        
               {"params": self.multiresddisc.parameters()}, 
        
           ]

I don't think the dac discriminator weights are being updated by the way, not sure if you're using pretrained weights or this is another 'minor bug' but the parameters aren't being given to the optimizer.

(self.dac.parameters() is not being given to the optimizer however I think the param.requires_grad is still set to true for those parameters, so it's not clear to me if this is intentional.)

jishengpeng · 2024-09-06T10:08:19Z

Modifying lines 185 and 186 in the experiments file will aid in reconstruction.

WavTokenizer/decoder/experiment.py

Lines 87 to 90 in e435b40

disc_params = [

{"params": self.multiperioddisc.parameters()},

{"params": self.multiresddisc.parameters()},

]

I don't think the dac discriminator weights are being updated by the way, not sure if you're using pretrained weights or this is another 'minor bug' but the parameters aren't being given to the optimizer.

(self.dac.parameters() is not being given to the optimizer however I think the param.requires_grad is still set to true for those parameters, so it's not clear to me if this is intentional.)

It is noteworthy that multiple versions of the wavtokenizer.

jishengpeng added the fix a minor bug New feature or request label Sep 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRD vs MS-STFTD #10

MRD vs MS-STFTD #10

Yagelmx commented Sep 3, 2024

CookiePPP commented Sep 4, 2024 •

edited

Loading

jishengpeng commented Sep 5, 2024 •

edited

Loading

CookiePPP commented Sep 5, 2024 •

edited

Loading

jishengpeng commented Sep 6, 2024

MRD vs MS-STFTD #10

MRD vs MS-STFTD #10

Comments

Yagelmx commented Sep 3, 2024

CookiePPP commented Sep 4, 2024 • edited Loading

jishengpeng commented Sep 5, 2024 • edited Loading

CookiePPP commented Sep 5, 2024 • edited Loading

jishengpeng commented Sep 6, 2024

CookiePPP commented Sep 4, 2024 •

edited

Loading

jishengpeng commented Sep 5, 2024 •

edited

Loading

CookiePPP commented Sep 5, 2024 •

edited

Loading