Skip to content

Commit

Permalink
Added melgan samples.
Browse files Browse the repository at this point in the history
  • Loading branch information
cschaefer26 committed Jul 8, 2020
1 parent e2bf314 commit 5257c4e
Show file tree
Hide file tree
Showing 2 changed files with 23 additions and 4 deletions.
23 changes: 21 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,31 @@
Inspired by Microsoft's [FastSpeech](https://www.microsoft.com/en-us/research/blog/fastspeech-new-text-to-speech-model-improves-on-speed-accuracy-and-controllability/)
we modified Tacotron to generate speech in a single forward pass using a duration predictor to align text and generated mel spectrograms.

# 🔈 Samples

The samples are generated with a model trained 100K steps on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) together with the pretrained WaveRNN vocoder provided by the [WaveRNN repo](https://github.com/fatchord/WaveRNN).
## ForwardTacotron + MelGAN Vocoder

The samples are generated with a model trained 400K steps on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) together with the pretrained WaveRNN vocoder provided by the [MelGAN repo](https://github.com/seungwonpark/melgan).

<p class="text">Scientists at the CERN laboratory say they have discovered a new particle.</p>

| normal speed | faster (1.25) | slower (0.85) |
|:---:|:---:|:---:|
|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/1_melgan_400k.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/1_melgan_400k_1.25.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/1_melgan_400k_0.8.wav?raw=true" controls preload></audio>|

<p class="text">There’s a way to measure the acute emotional intelligence that has never gone out of style.</p>

|:---:|:---:|:---:|
|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/2_melgan_400k.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/2_melgan_400k_1.25.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/2_melgan_400k_0.8.wav?raw=true" controls preload></audio>|

<p class="text">President Trump met with other leaders at the Group of 20 conference.</p>

|:---:|:---:|:---:|
|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/3_melgan_400k.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/3_melgan_400k_1.25.wav?raw=true" controls preload></audio>|<audio src="https://github.com/as-ideas/tts_model_outputs/blob/master/ljspeech_forward/3_melgan_400k_0.8.wav?raw=true" controls preload></audio>|

## ForwardTacotron + WaveRNN Vocoder

The samples are generated with a model trained 100K steps on [LJSpeech](https://keithito.com/LJ-Speech-Dataset/) together with the pretrained WaveRNN vocoder provided by the [WaveRNN repo](https://github.com/fatchord/WaveRNN).

<p class="text">Scientists at the CERN laboratory say they have discovered a new particle.</p>

| normal speed | faster (1.25) | slower (0.85) |
Expand Down
4 changes: 2 additions & 2 deletions gen_forward.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,15 +151,15 @@
if input_text:
save_path = paths.forward_output/f'{input_text[:10]}_{args.alpha}_{v_type}_{tts_k}k.wav'
else:
save_path = paths.forward_output/f'{i}_{v_type}_{tts_k}ko.wav'
save_path = paths.forward_output/f'{i}_{v_type}_{tts_k}_alpha{args.alpha}.wav'

if args.vocoder == 'wavernn':
m = torch.tensor(m).unsqueeze(0)
voc_model.generate(m, save_path, batched, hp.voc_target, hp.voc_overlap, hp.mu_law)
elif args.vocoder == 'griffinlim':
wav = reconstruct_waveform(m, n_iter=args.iters)
m_t = torch.tensor(m).unsqueeze(0)
torch.save(m_t, paths.forward_output/f'{i}_{tts_k}k.mel')
torch.save(m_t, paths.forward_output/f'{i}_{tts_k}_alpha{args.alpha}.mel')
save_wav(wav, save_path)

print('\n\nDone.\n')

0 comments on commit 5257c4e

Please sign in to comment.