You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm working on a Google Gemini LLM service for Pipecat and interested in any feedback people have about the LLMMessagesFrame class.
All the other LLMs with a chat (multi-turn) fine-tuning that I've worked with have adopted OpenAI's messages array format. Google's format is a bit different.
The role can only be user or model. Contrast with user, assistant, or system for OpenAI.
The message content shape is parts: [<string>, ...] instead of just content: <string>.
Implement the Gemini service so that it translates internally from the OpenAI data shape used by LLMMessage into the google.ai.generativelanguage data structures.
Implement a new LLMMessage class/subclass for use with Gemini models.
Design an abstraction that can represent higher-level concepts and that all of our LLM services will use.
I lean towards (1). I think it will be fairly straightforward and we can always do (2) later if we need to.
But I haven't yet gotten to the context management code here, and that may complicate things. Note: we can't use the Google library's convenience functions for multi-turn chat context management, because pipelines need to be interruptible. One important part of interruptibility is making sure that the LLM context includes only sentences that the LLM has "said" out loud to the user.
I'm working on a Google Gemini LLM service for Pipecat and interested in any feedback people have about the LLMMessagesFrame class.
All the other LLMs with a chat (multi-turn) fine-tuning that I've worked with have adopted OpenAI's messages array format. Google's format is a bit different.
role
can only beuser
ormodel
. Contrast withuser
,assistant
, orsystem
for OpenAI.parts: [<string>, ...]
instead of justcontent: <string>
.https://ai.google.dev/api/python/google/ai/generativelanguage/Content
https://ai.google.dev/gemini-api/docs/get-started/tutorial?lang=python#encode_messages
We could do at least three different things.
I lean towards (1). I think it will be fairly straightforward and we can always do (2) later if we need to.
But I haven't yet gotten to the context management code here, and that may complicate things. Note: we can't use the Google library's convenience functions for multi-turn chat context management, because pipelines need to be interruptible. One important part of interruptibility is making sure that the LLM context includes only sentences that the LLM has "said" out loud to the user.
Any other thoughts here are welcome!
Also, Discord thread is here if people want to hash things out ephemerally before etching pixels into the stone tablet of an issue comment: https://discord.com/channels/1239284677165056021/1239284677823565826/1240682255584854027
The text was updated successfully, but these errors were encountered: