Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about training vocoder #15

Closed
yangdongchao opened this issue Apr 17, 2022 · 3 comments
Closed

about training vocoder #15

yangdongchao opened this issue Apr 17, 2022 · 3 comments

Comments

@yangdongchao
Copy link

yangdongchao commented Apr 17, 2022

Hi, I have a problem about training mel-gan.
I find that when you train mel-gan, you normalize the audio data before transfer it to mel spectrum. e.g. In the file vocoder/mel2wav/dataset.py.
def load_wav_to_torch(self, full_path): data = np.load(full_path) data = 0.95 * normalize(data)

I just want to know why you try to nomalize it and the mutiply 0.95? After the nomalization operation, the extracted mel-spectrum is same as the orginal spectrum? I mean such operation whether influence the results when we use it to transfer the predicted specrum into wave?

Furthermore, when I use your script vocoder/scripts/generate_from_folder.py to generate sample, I find it fails (It means that the reverse audio is far from the orginal audio). After that I modify it as followwing: It works
`def main():
args = parse_args()
vocoder = MelVocoder(args.load_path)

args.save_path.mkdir(exist_ok=True, parents=True)

for i, fname in tqdm(enumerate(args.folder.glob("*.wav"))):
    wavname = fname.name
    wav, sr = librosa.core.load(fname)
    data = 0.95 * normalize(wav) # 
    #wav = torch.from_numpy(wav).unsqueeze(0)
    #mel = vocoder(torch.from_numpy(wav)[None])
    mel = wav2mel(wav)
    # print('mel ',mel.shape)
    # assert 1==2
    recons = vocoder.inverse(mel).squeeze().cpu().numpy()

    librosa.output.write_wav(args.save_path / wavname, recons, sr=sr)`
@v-iashin
Copy link
Owner

Hi, thanks for your issue! However, I think these questions should be addressed to the authors of MelGAN.

why you try to nomalize it and the mutiply 0.95?

This is a good question. Since we use the original MelGAN implementation, I think your question should be addressed to the authors of MelGAN. I am not sure why they decided to do it.

https://github.com/descriptinc/melgan-neurips/blob/6488045bfba1975602288de07a58570c7b4d66ea/mel2wav/dataset.py#L64

and it seems you are not the first one who wonders about it: descriptinc/melgan-neurips#36

I use your script vocoder/scripts/generate_from_folder.py to generate sample

I am not sure where you need this part of the code because I don't see it anywhere. Again, you need to ask the authors of MelGAN. Sorry for the confusion. I will remove the unnecessary code from this repository.

@v-iashin
Copy link
Owner

Also, check this piece of code if you wonder how to reconstruct predictions of the MelGAN generator:

pred_audio = netG(voc)
pred_audio = pred_audio.squeeze().cpu()
save_sample(root / ("generated_%d.wav" % i), 22050, pred_audio)
writer.add_audio(
"generated/sample_%d.wav" % i,
pred_audio,
epoch,
sample_rate=22050,
)

@yangdongchao
Copy link
Author

yangdongchao commented Oct 11, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants