You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since this is a diffusion model, I'm wondering if would be possible to implement something like ControlNet did for StableDiffusion/Flux
Basically is it possible to build models that drive the audio generation by, for example, extracting the emotion from a voice reference (different from the reference used to "inpaint"), or the roughness, etc...
I'm not asking now to implement this but I just wanted to understand the possibilities given by a diffusion model for voices
The text was updated successfully, but these errors were encountered:
Since this is a diffusion model, I'm wondering if would be possible to implement something like ControlNet did for StableDiffusion/Flux
Basically is it possible to build models that drive the audio generation by, for example, extracting the emotion from a voice reference (different from the reference used to "inpaint"), or the roughness, etc...
I'm not asking now to implement this but I just wanted to understand the possibilities given by a diffusion model for voices
The text was updated successfully, but these errors were encountered: