Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further consolidate service update settings into a single ServiceUpdateSettingsFrame class #527

Merged
merged 5 commits into from
Oct 2, 2024

Conversation

markbackman
Copy link
Contributor

@markbackman markbackman commented Sep 30, 2024

Hijacked this PR to make the change to simplify the service update settings. There's not a base class for service updates called ServiceUpdateSettingsFrame. There are subclasses for different service types: LLMUpdateSettingsFrame, TTSUpdateSettingsFrame, STTUpdateSettingsFrame.

When these frames are processed, there's an _update_settings handler in ai_services.py that is responsible for facilitating the update. For most values, this is a key lookup. There is special case handling for model and voice needed to ensure keys are mapped correctly.

For each service that has InputParams, those params are part of a _settings object which is updated by the _update_settings handlers.

This change simplifies maintaining settings updates for services. Going forward, the keys are to:

  • Unify naming for similar settings across services. In this PR, voice_id and model are used uniformly for services. Previously, the same change was made for language.
  • Settings no longer need setter functions. The single _update_settings method will do the trick unless special handling is required. You can see an example of that for Cartesia's speed and emotion params.

Lastly, in this PR, I'm also applying the Language type across all services. Includes must be from the Language enum. Language strings are translated to the correct format for each service before applying them.

@@ -558,6 +558,8 @@ class TTSUpdateSettingsFrame(ControlFrame):
style: Optional[str] = None
style_degree: Optional[str] = None
role: Optional[str] = None
gender: Optional[str] = None
google_style: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this frame should just be:

@dataclass
class TTSUpdateSettingsFrame(ControlFrame):
    settings: Mapping[str, Any]

and then use "gender", etc. where needed. Because we might have tons of other settings and by just looking at this frame we have no idea which setting corresponds to each service.

If we do that, maybe we can just have:

@dataclass
class ServiceUpdateSettingsFrame(ControlFrame):
    settings: Mapping[str, Any]

which is generic for all services. No need for LLMUpdateSettingsFrame or TTSUpdateSettingsFrame.

I know we lose completions and types but another benefit is that these won't have to change at all and people writing their own services (and not sharing them) they can keep using the frame. Actually, the same idea with extra, but we make all extra.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we did that, it could be more like the metrics and we could have defined types for each of the various known services. that said, this would be a pretty major breaking change, yeah?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i take that back. the metrics aren't really service specific. i mis-remembered. but there could still be some typing there eventually. maybe start with Any for now?

Copy link
Contributor

@mattieruth mattieruth Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i take it back again, we could still have different types defined for llm vs tts vs stt. I'm not sure where the google specific goes, though. Is there a way google.py could expand? or maybe instead of having all settings by [str, Any], there's a custom_settings that maps to any?

// in frames.py
@dataclass
class UpdateSettingsFrame(ControlFrame):
    settings: SettingsUpdate

// new settingsUpdates.py ? (name tbd)
class SettingsUpdate(BaseModel):
    model: Optional[str] = None
    custom: Optional[Mapping[str, Any]]

class LLMSettingsUpdate(SettingsUpdate):
   ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw. working it through with mark, i realized that since these frames are created in client code, the typing described above didn't really make sense.

Comment on lines 561 to 562
gender: Optional[str] = None
google_style: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not a fan of having such custom service settings in the base class, but given the current way things are setup, it seems like this is the best option. we should consider if there's a better way to handle custom setting updates in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's the same comment I posted. Would my proposal work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i think we were reviewing at the same time. I like your proposal and commented on how it might be fleshed out a bit more to maintain some typing.

@markbackman markbackman force-pushed the mb/google-tts-inputs branch from 226ad77 to f98e3ca Compare October 1, 2024 14:50
@markbackman markbackman changed the title Add support for gender and google_style inputs to Google TTS Further consolidate service update settings into a single ServiceUpdateSettingsFrame class Oct 1, 2024
async def _update_settings(self, settings: Dict[str, Any]):
for key, value in settings.items():
setter = getattr(self, f"set_{key}", None)
if setter and callable(setter):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's worth it to have a self._settings: Dict[str, Any] in the class instead if one field per setting.

Then you could just do:

for key, value in settings.items():
  if key in settings:
      self._settings[key] = value
  else:
      logger.warning(f"Unknown setting for Anthropic LLM service: {key}")

and in the constructor we initialize self._settings = { "key": value, .... }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just that

            setter = getattr(self, f"set_{key}", None)
            if setter and callable(setter):

feels a bit wrong.

@markbackman markbackman force-pushed the mb/google-tts-inputs branch 3 times, most recently from 08d0738 to 91dfc0b Compare October 1, 2024 18:41
@markbackman markbackman requested a review from aconchillo October 1, 2024 18:45
@markbackman markbackman force-pushed the mb/google-tts-inputs branch from 91dfc0b to 9713d75 Compare October 1, 2024 19:07
await self._disconnect()
await self._connect()

async def set_voice_settings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would this function have been used before? i don't see any usage of it in the codebase and i'm trying to think through if removing this custom-looking handling of the settings breaks things

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood how this could be used. It wasn't accessible before! So, it should be safe to remove.

Comment on lines -194 to -196
if self._websocket:
msg = {"voice_settings": self._voice_settings}
await self._websocket.send(json.dumps(msg))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to expand on above... maybe this needs to override _update_settings() to be something like:

async def _update_settings(self, settings: Dict[str, Any]):
  super()._update_settings(settings)
  self._voice_settings = self._set_voice_settings()
  if self._websocket:
    msg = {"voice_settings": self._voice_settings}	
    await self._websocket.send(json.dumps(msg))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently this function was not needed to begin with and should be removed and functionality not preserved, so this can be ignored... but not the one above about set_voice

@@ -167,34 +180,6 @@ async def set_model(self, model: str):
await self._disconnect()
await self._connect()

async def set_voice(self, voice: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here. this has custom handling that's more than just setting the voice.... updating my theoretical code below to include a potential way to handle it.

Copy link
Contributor

@mattieruth mattieruth Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nevermind. that one can be ignored... i think this needs to be added to custom handle voice changes though:

async def _update_settings(self, settings: Dict[str, Any]):
  prev_voice = self._settings["voice"]
  super()._update_settings(settings)
  if not prev_voice == self._settings["voice"]:
    await self._disconnect()
    await self._connect()

TBH. The fact that this behavior also happens when the model changes, I'm wondering if elevenlabs requires a disconnect/connect to apply ANY changes 🤔

async def _update_settings(self, settings: Dict[str, Any]):
  prev_settings = self._settings
  super()._update_settings(settings)
  if not prev_settings == self._settings:
    await self._disconnect()
    await self._connect()

Copy link
Contributor

@mattieruth mattieruth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'll leave further review to aleix. but to me, other than the couple of places that need custom handling I pointed out, this looks like good cleanup!

@aconchillo
Copy link
Contributor

LGTM!

@markbackman markbackman force-pushed the mb/google-tts-inputs branch 5 times, most recently from e815da8 to c8b17c5 Compare October 1, 2024 23:39
@markbackman markbackman force-pushed the mb/google-tts-inputs branch from c8b17c5 to 406318e Compare October 1, 2024 23:55
@markbackman markbackman force-pushed the mb/google-tts-inputs branch from 406318e to 28643b4 Compare October 2, 2024 00:30
@aconchillo
Copy link
Contributor

LGTM!

@markbackman markbackman merged commit 096a15e into main Oct 2, 2024
3 checks passed
@aconchillo aconchillo deleted the mb/google-tts-inputs branch October 23, 2024 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants