Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature]: AutoPitch - automatic pitch detection #786

Open
Bebra777228 opened this issue Oct 5, 2024 · 16 comments
Open

[Feature]: AutoPitch - automatic pitch detection #786

Bebra777228 opened this issue Oct 5, 2024 · 16 comments
Labels
enhancement New feature or request feature

Comments

@Bebra777228
Copy link
Contributor

Bebra777228 commented Oct 5, 2024

Description

Recently, a question regarding this has already been asked that, in my opinion, was not formulated quite accurately.

I would like to know if you plan to implement a function for automatic pitch detection, similar to how it was implemented in SVC.

I am not sure if this feature is still available in SVC, but about a year and a half ago, during inference, you could check the 'autopitch' box. This allowed the model to automatically adjust to the pitch of the voice in the source recording, resulting in a more realistic voice than with manual pitch adjustment (even though this feature worked poorly, the results were good). At least, that's how it seemed to me.

Problem

Proposed Solution

Alternatives Considered

@Bebra777228 Bebra777228 added enhancement New feature or request feature labels Oct 5, 2024
@rmdom-dev
Copy link
Contributor

SVC is no longer used, it is too old, instead we use RVC which is much better and more up to date.

@blaisewf
Copy link
Member

blaisewf commented Oct 5, 2024

SVC is no longer used, it is too old, instead we use RVC which is much better and more up to date.

pero has leído?

@rmdom-dev
Copy link
Contributor

perdona, no se mucho ingles

@kro-ai
Copy link

kro-ai commented Oct 7, 2024

SVC is no longer used, it is too old, instead we use RVC which is much better and more up to date.

Creo que piden la misma función en RVC. Estoy de acuerdo con ellos, ¡sería muy útil! entonces, al hacer inferencias, no es necesario ajustar los semitonos, lo hace por sí solo detectando el tono del audio de entrada.

Sorry for the bad spanish, this is auto-translated, I just wanted to make sure you understand.

For those who don't speak spanish, this feature would be very useful because you wouldn't have to adjust the pitch, ie -12 +12 semitones. It would automatically detect the pitch so if your model is a male voice and your input is a female voice, it would make the output pitch lower to compensate and make it sounds more natural. I used this feature all the time in SVC and found it very useful! Although It didn't work that great all the time, it was still useful to have. Not sure It would work all that great with singing audio though, but when using speaking audio it's really great.

@AznamirWoW
Copy link
Contributor

For such functionality to work, there should be some kind of record of what the model was trained on
detecting a max F0 value from inferred audio can be done, but adjusting the pitch down without knowing what the model is capable of, is not.

@kro-ai
Copy link

kro-ai commented Oct 7, 2024

For such functionality to work, there should be some kind of record of what the model was trained on detecting a max F0 value from inferred audio can be done, but adjusting the pitch down without knowing what the model is capable of, is not.

Having looked into it a bit more, the way it works on SVC is that an f0 predictor is trained alongside the main model. Which explains why It wouldn't be possible with RVC. It is a shame as this would be very useful.

@tomakorea
Copy link

I also had really excellent experience with SVC Automatic pitch detection, it made the spoken voice really realistic, actually better than RVC or Applio where to be realistic, it's often necessary to do a lot of manual editing.

@kro-ai
Copy link

kro-ai commented Oct 8, 2024

I also had really excellent experience with SVC Automatic pitch detection, it made the spoken voice really realistic, actually better than RVC or Applio where to be realistic, it's often necessary to do a lot of manual editing.

Me too, It was really useful for speaking audio. Maybe some brave soul can add this to Applio.

@Chilluminati91
Copy link
Contributor

Shouldnt this be pretty simple in general? Calculate a mean f0 from the training data. Then calculate mean f0 from your clip before inference and shift by the difference.

@kro-ai
Copy link

kro-ai commented Oct 10, 2024

Shouldnt this be pretty simple in general? Calculate a mean f0 from the training data. Then calculate mean f0 from your clip before inference and shift by the difference.

Interesting.

@blaisewf
Copy link
Member

@Bebra777228 could you share the code used on SVC to do that "AutoPitch" function?

@Bebra777228
Copy link
Contributor Author

I can't precisely determine which file this is implemented in, but I assume it might be the models.py file. This file contains the parameter use_automatic_f0_prediction, which might be what you need.

Overall, it's easiest to search through the code. You might find something useful if you use this search link.

@BornSaint
Copy link
Contributor

Shouldnt this be pretty simple in general? Calculate a mean f0 from the training data. Then calculate mean f0 from your clip before inference and shift by the difference.

yes, it's a code that detect input pitch and compare to model pitch, then calculate best transpose.

@tomakorea
Copy link

I implemented it for fun, making the algorithm a bit more robust for handling edge cases, I also added RMS extraction in same time. It's not a game changer but it's convenient.

@blaisewf
Copy link
Member

I implemented it for fun, making the algorithm a bit more robust for handling edge cases, I also added RMS extraction in same time. It's not a game changer but it's convenient.

feel free to open pr to see your code

@fallbringer3
Copy link

I implemented it for fun, making the algorithm a bit more robust for handling edge cases, I also added RMS extraction in same time. It's not a game changer but it's convenient.

@tomakorea could you share it by any chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature
Projects
None yet
Development

No branches or pull requests

9 participants