Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any advice for a voice that just won't replicate? #79

Open
johnGettings opened this issue May 30, 2022 · 6 comments
Open

Any advice for a voice that just won't replicate? #79

johnGettings opened this issue May 30, 2022 · 6 comments

Comments

@johnGettings
Copy link

I have used TorToiSe on several voices and it works amazingly. But then there's some that just will not work no matter what I put into it. I have tried more audio clips, a lot more audio clips, less clips, old and young, monotone, various sentences, longer sentences and shorter sentences.

Has anybody found any tricks to get a stubborn voice working? Or is TorToiSe just incapable of reproducing some voices that are too different than anything in the training set?

Thanks

@neonbjb
Copy link
Owner

neonbjb commented May 31, 2022

I'll leave this open in case someone else has any suggestions, but in general - yes, there are many voices that Tortoise simply struggles with. The "fix" would be to scale up the model and dataset, but I do not have any plans to do so.

@jnordberg
Copy link
Contributor

I've found that using different source audio can make all the difference, even if it's seemingly lower quality. I've also had some success improving voice likeness by messing with EQ and applying loudness normalization in Audacity.

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

@ExponentialML
Copy link

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

Interesting idea. A repository that comes to mind is Matchering. While I haven't used it on a voice before, I wonder if mastering (using loosely here) a voice against a trained one using this application would improve results.

@wavymulder
Copy link
Contributor

wavymulder commented Jun 1, 2022

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

In one of my experiments, a voice turned more British simply by reducing the reverb in the sample audio and equalizing it to be more neutral. The model has definitely picked up subtle patterns it found.

@neonbjb
Copy link
Owner

neonbjb commented Jun 1, 2022

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

In one of my experiments, a voice turned more British simply by reducing the reverb in the sample audio and equalizing it to be more neutral. The model has definitely picked up subtle patterns it found.

That's a cool finding. Makes total sense, too.

@I-Have-No-Idea-What-IAmDoing

I noticed that the content of the text affects the voice like if you make it talk about more clinical/research-ery stuff (like the abstract of a paper on arxiv) then it will comes out more British.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants