Any advice for a voice that just won't replicate? #79

johnGettings · 2022-05-30T06:57:54Z

I have used TorToiSe on several voices and it works amazingly. But then there's some that just will not work no matter what I put into it. I have tried more audio clips, a lot more audio clips, less clips, old and young, monotone, various sentences, longer sentences and shorter sentences.

Has anybody found any tricks to get a stubborn voice working? Or is TorToiSe just incapable of reproducing some voices that are too different than anything in the training set?

Thanks

neonbjb · 2022-05-31T18:01:09Z

I'll leave this open in case someone else has any suggestions, but in general - yes, there are many voices that Tortoise simply struggles with. The "fix" would be to scale up the model and dataset, but I do not have any plans to do so.

jnordberg · 2022-06-01T10:36:55Z

I've found that using different source audio can make all the difference, even if it's seemingly lower quality. I've also had some success improving voice likeness by messing with EQ and applying loudness normalization in Audacity.

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

ExponentialML · 2022-06-01T13:07:35Z

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

Interesting idea. A repository that comes to mind is Matchering. While I haven't used it on a voice before, I wonder if mastering (using loosely here) a voice against a trained one using this application would improve results.

wavymulder · 2022-06-01T22:32:27Z

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

In one of my experiments, a voice turned more British simply by reducing the reverb in the sample audio and equalizing it to be more neutral. The model has definitely picked up subtle patterns it found.

neonbjb · 2022-06-01T22:36:55Z

It feels like the model picks up on subtle queues in the waveform like what compressor or microphone was used during recording...

In one of my experiments, a voice turned more British simply by reducing the reverb in the sample audio and equalizing it to be more neutral. The model has definitely picked up subtle patterns it found.

That's a cool finding. Makes total sense, too.

I-Have-No-Idea-What-IAmDoing · 2022-07-24T06:18:17Z

I noticed that the content of the text affects the voice like if you make it talk about more clinical/research-ery stuff (like the abstract of a paper on arxiv) then it will comes out more British.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any advice for a voice that just won't replicate? #79

Any advice for a voice that just won't replicate? #79

johnGettings commented May 30, 2022

neonbjb commented May 31, 2022

jnordberg commented Jun 1, 2022

ExponentialML commented Jun 1, 2022

wavymulder commented Jun 1, 2022 •

edited

Loading

neonbjb commented Jun 1, 2022

I-Have-No-Idea-What-IAmDoing commented Jul 24, 2022

Any advice for a voice that just won't replicate? #79

Any advice for a voice that just won't replicate? #79

Comments

johnGettings commented May 30, 2022

neonbjb commented May 31, 2022

jnordberg commented Jun 1, 2022

ExponentialML commented Jun 1, 2022

wavymulder commented Jun 1, 2022 • edited Loading

neonbjb commented Jun 1, 2022

I-Have-No-Idea-What-IAmDoing commented Jul 24, 2022

wavymulder commented Jun 1, 2022 •

edited

Loading