Added melgan samples.

geneing · Jul 8, 2020 · 5257c4e · 5257c4e
1 parent e2bf314
commit 5257c4e
Show file tree

Hide file tree

Showing 2 changed files with 23 additions and 4 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -1,12 +1,31 @@
 Inspired by Microsoft's [FastSpeech](https://www.microsoft.com/en-us/research/blog/fastspeech-new-text-to-speech-model-improves-on-speed-accuracy-and-controllability/)
 we modified Tacotron to generate speech in a single forward pass using a duration predictor to align text and generated mel spectrograms.
 
-# 🔈 Samples
 
-The samples are generated with a model trained 100K steps on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) together with the pretrained WaveRNN vocoder provided by the [WaveRNN repo](https://github.com/fatchord/WaveRNN).
+## ForwardTacotron + MelGAN Vocoder
+
+The samples are generated with a model trained 400K steps on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) together with the pretrained WaveRNN vocoder provided by the [MelGAN repo](https://github.com/seungwonpark/melgan).
+
+<p class="text">Scientists at the CERN laboratory say they have discovered a new particle.</p> 
+
+| normal speed | faster (1.25) | slower (0.85) |
+|:---:|:---:|:---:|
+|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/1_melgan_400k.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/1_melgan_400k_1.25.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/1_melgan_400k_0.8.wav?raw=true" controls preload></audio>|
+
+<p class="text">There’s a way to measure the acute emotional intelligence that has never gone out of style.</p>
+
+|:---:|:---:|:---:|
+|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/2_melgan_400k.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/2_melgan_400k_1.25.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/2_melgan_400k_0.8.wav?raw=true" controls preload></audio>|
+
+<p class="text">President Trump met with other leaders at the Group of 20 conference.</p>
+
+|:---:|:---:|:---:|
+|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/3_melgan_400k.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/3_melgan_400k_1.25.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/3_melgan_400k_0.8.wav?raw=true" controls preload></audio>|
 
 ## ForwardTacotron + WaveRNN Vocoder
 
+The samples are generated with a model trained 100K steps on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) together with the pretrained WaveRNN vocoder provided by the [WaveRNN repo](https://github.com/fatchord/WaveRNN).
+
 <p class="text">Scientists at the CERN laboratory say they have discovered a new particle.</p> 
 
 | normal speed | faster (1.25) | slower (0.85) |

diff --git a/gen_forward.py b/gen_forward.py
@@ -151,15 +151,15 @@
         if input_text:
             save_path = paths.forward_output/f'{input_text[:10]}_{args.alpha}_{v_type}_{tts_k}k.wav'
         else:
-            save_path = paths.forward_output/f'{i}_{v_type}_{tts_k}ko.wav'
+            save_path = paths.forward_output/f'{i}_{v_type}_{tts_k}_alpha{args.alpha}.wav'
 
         if args.vocoder == 'wavernn':
             m = torch.tensor(m).unsqueeze(0)
             voc_model.generate(m, save_path, batched, hp.voc_target, hp.voc_overlap, hp.mu_law)
         elif args.vocoder == 'griffinlim':
             wav = reconstruct_waveform(m, n_iter=args.iters)
             m_t = torch.tensor(m).unsqueeze(0)
-            torch.save(m_t, paths.forward_output/f'{i}_{tts_k}k.mel')
+            torch.save(m_t, paths.forward_output/f'{i}_{tts_k}_alpha{args.alpha}.mel')
             save_wav(wav, save_path)
 
     print('\n\nDone.\n')