about training vocoder #15

yangdongchao · 2022-04-17T09:02:52Z

Hi, I have a problem about training mel-gan.
I find that when you train mel-gan, you normalize the audio data before transfer it to mel spectrum. e.g. In the file vocoder/mel2wav/dataset.py.
def load_wav_to_torch(self, full_path): data = np.load(full_path) data = 0.95 * normalize(data)

I just want to know why you try to nomalize it and the mutiply 0.95? After the nomalization operation, the extracted mel-spectrum is same as the orginal spectrum? I mean such operation whether influence the results when we use it to transfer the predicted specrum into wave?

Furthermore, when I use your script vocoder/scripts/generate_from_folder.py to generate sample, I find it fails (It means that the reverse audio is far from the orginal audio). After that I modify it as followwing: It works
`def main():
args = parse_args()
vocoder = MelVocoder(args.load_path)

args.save_path.mkdir(exist_ok=True, parents=True)

for i, fname in tqdm(enumerate(args.folder.glob("*.wav"))):
    wavname = fname.name
    wav, sr = librosa.core.load(fname)
    data = 0.95 * normalize(wav) # 
    #wav = torch.from_numpy(wav).unsqueeze(0)
    #mel = vocoder(torch.from_numpy(wav)[None])
    mel = wav2mel(wav)
    # print('mel ',mel.shape)
    # assert 1==2
    recons = vocoder.inverse(mel).squeeze().cpu().numpy()

    librosa.output.write_wav(args.save_path / wavname, recons, sr=sr)`

The text was updated successfully, but these errors were encountered:

v-iashin · 2022-04-17T10:45:51Z

Hi, thanks for your issue! However, I think these questions should be addressed to the authors of MelGAN.

why you try to nomalize it and the mutiply 0.95?

This is a good question. Since we use the original MelGAN implementation, I think your question should be addressed to the authors of MelGAN. I am not sure why they decided to do it.

https://github.com/descriptinc/melgan-neurips/blob/6488045bfba1975602288de07a58570c7b4d66ea/mel2wav/dataset.py#L64

and it seems you are not the first one who wonders about it: descriptinc/melgan-neurips#36

I use your script vocoder/scripts/generate_from_folder.py to generate sample

I am not sure where you need this part of the code because I don't see it anywhere. Again, you need to ask the authors of MelGAN. Sorry for the confusion. I will remove the unnecessary code from this repository.

v-iashin · 2022-04-17T10:51:34Z

Also, check this piece of code if you wonder how to reconstruct predictions of the MelGAN generator:

SpecVQGAN/vocoder/scripts/train.py

Lines 194 to 202 in 3894458

    
           pred_audio = netG(voc) 
        
           pred_audio = pred_audio.squeeze().cpu() 
        
           save_sample(root / ("generated_%d.wav" % i), 22050, pred_audio) 
        
           writer.add_audio( 
        
               "generated/sample_%d.wav" % i, 
        
               pred_audio, 
        
               epoch, 
        
               sample_rate=22050, 
        
           )

yangdongchao · 2022-10-11T07:33:21Z

Thanks you very much | | 15087581161 | | ***@***.*** | On 4/17/2022 18:51，Vladimir ***@***.***> wrote： Also, check this piece of code if you wonder how to reconstruct predictions of the MelGAN generator: https://github.com/v-iashin/SpecVQGAN/blob/389445808a6a8301b888fe55e2a5d27b5593cefd/vocoder/scripts/train.py#L194-L202 — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: ***@***.***>

v-iashin closed this as completed Apr 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about training vocoder #15

about training vocoder #15

yangdongchao commented Apr 17, 2022 •

edited

Loading

v-iashin commented Apr 17, 2022

v-iashin commented Apr 17, 2022

yangdongchao commented Oct 11, 2022 via email

about training vocoder #15

about training vocoder #15

Comments

yangdongchao commented Apr 17, 2022 • edited Loading

v-iashin commented Apr 17, 2022

v-iashin commented Apr 17, 2022

yangdongchao commented Oct 11, 2022 via email

yangdongchao commented Apr 17, 2022 •

edited

Loading