Combine voices #44

dobrosketchkun · 2022-05-14T12:11:03Z

It's too small of an issue to create a pull request, I guess.

In your .ipynb file you have this cell:

# You can also combine conditioning voices. Combining voices produces a new voice
# with traits from all the parents.
#
# Lets see what it would sound like if Picard and Kirk had a kid with a penchant for philosophy:
voice_samples, conditioning_latents = load_voices(['pat', 'william'])

gen = tts.tts_with_preset("They used to say that if man was meant to fly, he’d have wings. But he did fly. He discovered he had to.", 
                          voice_samples=None, conditioning_latents=None, preset=preset)
torchaudio.save('captain_kirkard.wav', gen.squeeze(0).cpu(), 24000)
IPython.display.Audio('captain_kirkard.wav')

I think voice_samples=None, conditioning_latents=None bit supposed to be voice_samples=voice_samples, conditioning_latents=conditioning_latents because otherwise it won't work.

The text was updated successfully, but these errors were encountered:

neonbjb · 2022-05-14T13:35:29Z

🤦thank you! I broke this in 2ca4ea9

I've fixed the live colab. The repo fix will need to wait until I wrap up some local development.

dobrosketchkun · 2022-05-14T15:47:55Z

Nice!
And while we are on this matter, what do you think about giving the user the ability to save random voice? Since it is pulled from a latent space there are no wavs, but you can just save and then load a tensor or something?

neonbjb · 2022-05-14T15:56:31Z

So I originally intended to do this. However, I discovered that for some reason the random voice latents do not consistently produce the same voice. So if you feed the same random voice latent into the model for the same text, you will get two different voices.

I can't explain this. I need to do some further investigation, but haven't found the time.

dobrosketchkun · 2022-05-14T16:29:02Z

However, I discovered that for some reason the random voice latents do not consistently produce the same voice.

Same here. I thought it's on me since I'm not really a programmer, but that's how it is, I guees.

davidhhh123 · 2022-06-22T19:51:07Z

How can I clone the voice of one audio into another? without text

neonbjb · 2022-06-22T20:13:47Z

You can't.

davidhhh123 · 2022-06-22T20:25:48Z

what a pity, can you explain, I didn’t understand the timestep_independent function a little, what it does with the data

davidhhh123 · 2022-06-22T20:26:50Z

I want to understand architecture better

neonbjb · 2022-06-22T20:44:34Z

Diffusion models work by iteratively refining an input from pure Gaussian noise to a desired target space. Those iterations are referred to as "timesteps". In the case of Tortoise, there are some components of the network that produce the same output regardless of what timestep you are on. So for those computations, it is more efficient to do them once and re-use their outputs then to re-compute them for every timestep. This is the purpose of the "timestep_independent" function. It performs every computation that does not rely on the timestep signal.

davidhhh123 · 2022-06-23T08:28:16Z

thanks for the reply: can i ask a few more questions, does get_conditioning_latents fetch the data to clone? and what does an autoregressive model do.

neonbjb · 2022-06-23T15:51:02Z

get_conditioning_latents transforms the voice sample that you provide the model into a vector representation that the AR and diffusion models can use.

For what an AR model does: no disrespect meant, but you should Google this. I am not nearly good enough with words to outcompete all the great content out there on this subject. I'd also watch a video on DALL-E (1) or read the paper.

davidhhh123 · 2022-06-23T16:07:23Z

thank you so much

…nbjb#44) from aJoe/tortoise-tts:main into main Reviewed-on: https://git.ecker.tech/mrq/tortoise-tts/pulls/44

zachwe pushed a commit to zachwe/tortoise-tts that referenced this issue Sep 12, 2023

Merge pull request 'Update tortoise/utils/devices.py vram issue' (neo…

f025470

…nbjb#44) from aJoe/tortoise-tts:main into main Reviewed-on: https://git.ecker.tech/mrq/tortoise-tts/pulls/44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine voices #44

Combine voices #44

dobrosketchkun commented May 14, 2022

neonbjb commented May 14, 2022

dobrosketchkun commented May 14, 2022

neonbjb commented May 14, 2022

dobrosketchkun commented May 14, 2022

davidhhh123 commented Jun 22, 2022

neonbjb commented Jun 22, 2022

davidhhh123 commented Jun 22, 2022

davidhhh123 commented Jun 22, 2022

neonbjb commented Jun 22, 2022

davidhhh123 commented Jun 23, 2022

neonbjb commented Jun 23, 2022

davidhhh123 commented Jun 23, 2022

Combine voices #44

Combine voices #44

Comments

dobrosketchkun commented May 14, 2022

neonbjb commented May 14, 2022

dobrosketchkun commented May 14, 2022

neonbjb commented May 14, 2022

dobrosketchkun commented May 14, 2022

davidhhh123 commented Jun 22, 2022

neonbjb commented Jun 22, 2022

davidhhh123 commented Jun 22, 2022

davidhhh123 commented Jun 22, 2022

neonbjb commented Jun 22, 2022

davidhhh123 commented Jun 23, 2022

neonbjb commented Jun 23, 2022

davidhhh123 commented Jun 23, 2022