Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add speed and emotion options for Cartesia. #435

Merged
merged 8 commits into from
Sep 26, 2024
Merged

Conversation

golbin
Copy link
Contributor

@golbin golbin commented Aug 31, 2024

I've just added speed and emotion options to Cartesia TTS.

While these are experimental features, they are very useful and could soon be ready for production.

@@ -73,6 +73,8 @@ def __init__(
encoding: str = "pcm_s16le",
sample_rate: int = 16000,
language: str = "en",
speed: str = None,
emotion: list = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking that instead of adding more arguments we could do like FalImageGenService or GladiaSTTService:

class CartesiaTTSService(TTSService):
    class InputParams(BaseModel):
        speed: Optional[str] = None,
        emotion: Optional[List[str]] = None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, wouldn’t it be better to include the other options like encoding, sample_rate, and language in the InputParams as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just didn't want to ask you to do more work 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem, I’ve made a patch based on your advice.

The 'voice_id' parameter is required, so I’ve left it as the default parameter.

Please review and let me know if everything looks good.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… maybe we need something like TTSEmotionUpdateFrame and TTSSpeedUpdateFrame? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed for now.

@@ -61,6 +62,14 @@ def language_to_cartesia_language(language: Language) -> str | None:


class CartesiaTTSService(TTSService):
class InputParams(BaseModel):
model_id: Optional[str] = "sonic-english"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the model_id as a main argument as with all the other AI services. The rest looks great!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a conflict.

@golbin
Copy link
Contributor Author

golbin commented Sep 17, 2024

@aconchillo I fixed conflict and revert "model_id" as a main argument. Please check that.

@aconchillo
Copy link
Contributor

Looks great! I believe there's one more conflict but I'll merge right away once that's solved. Thank you for your patience!

@golbin
Copy link
Contributor Author

golbin commented Sep 23, 2024

I'm sorry for the delay. The conflict has been resolved.

@golbin
Copy link
Contributor Author

golbin commented Sep 23, 2024

Apply and fix for recent changes. It might be better to make CartesiaTTS class for inheritance in CartesiaTTSService/CartesiaHttpService later.

@williamtran29
Copy link

Can we merge this asap? the speed is too fast atm

@markbackman
Copy link
Contributor

@golbin we've made a switch to Ruff as a formatter. This was to resolve formatting issues with Autopep-8. The README has been updated with instructions for setting up Ruff as a linter in your IDE. Can you set up Ruff, fix the linting issues, and push your latest changes? With that either @aconchillo or I can merge it down right away.

@markbackman markbackman self-requested a review September 25, 2024 18:48
@golbin
Copy link
Contributor Author

golbin commented Sep 26, 2024

I will fix it in 24 hours.

@markbackman
Copy link
Contributor

Thanks @golbin! 🙌

@markbackman markbackman merged commit b1818cc into pipecat-ai:main Sep 26, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants