-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine voices #44
Comments
🤦thank you! I broke this in 2ca4ea9 I've fixed the live colab. The repo fix will need to wait until I wrap up some local development. |
Nice! |
So I originally intended to do this. However, I discovered that for some reason the random voice latents do not consistently produce the same voice. So if you feed the same random voice latent into the model for the same text, you will get two different voices. I can't explain this. I need to do some further investigation, but haven't found the time. |
Same here. I thought it's on me since I'm not really a programmer, but that's how it is, I guees. |
How can I clone the voice of one audio into another? without text |
You can't. |
what a pity, can you explain, I didn’t understand the timestep_independent function a little, what it does with the data |
I want to understand architecture better |
Diffusion models work by iteratively refining an input from pure Gaussian noise to a desired target space. Those iterations are referred to as "timesteps". In the case of Tortoise, there are some components of the network that produce the same output regardless of what timestep you are on. So for those computations, it is more efficient to do them once and re-use their outputs then to re-compute them for every timestep. This is the purpose of the "timestep_independent" function. It performs every computation that does not rely on the timestep signal. |
thanks for the reply: can i ask a few more questions, does get_conditioning_latents fetch the data to clone? and what does an autoregressive model do. |
For what an AR model does: no disrespect meant, but you should Google this. I am not nearly good enough with words to outcompete all the great content out there on this subject. I'd also watch a video on DALL-E (1) or read the paper. |
thank you so much |
…nbjb#44) from aJoe/tortoise-tts:main into main Reviewed-on: https://git.ecker.tech/mrq/tortoise-tts/pulls/44
It's too small of an issue to create a pull request, I guess.
In your .ipynb file you have this cell:
I think
voice_samples=None, conditioning_latents=None
bit supposed to bevoice_samples=voice_samples, conditioning_latents=conditioning_latents
because otherwise it won't work.The text was updated successfully, but these errors were encountered: