-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add speed and emotion options for Cartesia. #435
Conversation
src/pipecat/services/cartesia.py
Outdated
@@ -73,6 +73,8 @@ def __init__( | |||
encoding: str = "pcm_s16le", | |||
sample_rate: int = 16000, | |||
language: str = "en", | |||
speed: str = None, | |||
emotion: list = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking that instead of adding more arguments we could do like FalImageGenService
or GladiaSTTService
:
class CartesiaTTSService(TTSService):
class InputParams(BaseModel):
speed: Optional[str] = None,
emotion: Optional[List[str]] = None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, wouldn’t it be better to include the other options like encoding, sample_rate, and language in the InputParams as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just didn't want to ask you to do more work 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, I’ve made a patch based on your advice.
The 'voice_id' parameter is required, so I’ve left it as the default parameter.
Please review and let me know if everything looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm… maybe we need something like TTSEmotionUpdateFrame and TTSSpeedUpdateFrame? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed for now.
src/pipecat/services/cartesia.py
Outdated
@@ -61,6 +62,14 @@ def language_to_cartesia_language(language: Language) -> str | None: | |||
|
|||
|
|||
class CartesiaTTSService(TTSService): | |||
class InputParams(BaseModel): | |||
model_id: Optional[str] = "sonic-english" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd keep the model_id
as a main argument as with all the other AI services. The rest looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a conflict.
@aconchillo I fixed conflict and revert "model_id" as a main argument. Please check that. |
Looks great! I believe there's one more conflict but I'll merge right away once that's solved. Thank you for your patience! |
I'm sorry for the delay. The conflict has been resolved. |
Apply and fix for recent changes. It might be better to make CartesiaTTS class for inheritance in CartesiaTTSService/CartesiaHttpService later. |
Can we merge this asap? the speed is too fast atm |
@golbin we've made a switch to Ruff as a formatter. This was to resolve formatting issues with Autopep-8. The README has been updated with instructions for setting up Ruff as a linter in your IDE. Can you set up Ruff, fix the linting issues, and push your latest changes? With that either @aconchillo or I can merge it down right away. |
I will fix it in 24 hours. |
Thanks @golbin! 🙌 |
I've just added speed and emotion options to Cartesia TTS.
While these are experimental features, they are very useful and could soon be ready for production.