Implement Google Gemini LLM service #145

kwindla · 2024-05-16T15:07:43Z

I'm working on a Google Gemini LLM service for Pipecat and interested in any feedback people have about the LLMMessagesFrame class.

All the other LLMs with a chat (multi-turn) fine-tuning that I've worked with have adopted OpenAI's messages array format. Google's format is a bit different.

The role can only be user or model. Contrast with user, assistant, or system for OpenAI.
The message content shape is parts: [<string>, ...] instead of just content: <string>.
Inline image data is also typed differently.

https://ai.google.dev/api/python/google/ai/generativelanguage/Content

https://ai.google.dev/gemini-api/docs/get-started/tutorial?lang=python#encode_messages

We could do at least three different things.

Implement the Gemini service so that it translates internally from the OpenAI data shape used by LLMMessage into the google.ai.generativelanguage data structures.
Implement a new LLMMessage class/subclass for use with Gemini models.
Design an abstraction that can represent higher-level concepts and that all of our LLM services will use.

I lean towards (1). I think it will be fairly straightforward and we can always do (2) later if we need to.

But I haven't yet gotten to the context management code here, and that may complicate things. Note: we can't use the Google library's convenience functions for multi-turn chat context management, because pipelines need to be interruptible. One important part of interruptibility is making sure that the LLM context includes only sentences that the LLM has "said" out loud to the user.

Any other thoughts here are welcome!

Also, Discord thread is here if people want to hash things out ephemerally before etching pixels into the stone tablet of an issue comment: https://discord.com/channels/1239284677165056021/1239284677823565826/1240682255584854027

The text was updated successfully, but these errors were encountered:

chadbailey59 · 2024-05-16T20:05:46Z

I think (1) as well, as long as Google's alternate structure doesn't actually represent something different (which it doesn't).

kwindla · 2024-05-17T04:07:59Z

Initial (draft) implementation: #150

https://x.com/kwindla/status/1791319660442611731

aconchillo · 2024-05-20T14:27:51Z

This is now available in 0.0.17. Closing.

aconchillo closed this as completed May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Google Gemini LLM service #145

Implement Google Gemini LLM service #145

kwindla commented May 16, 2024 •

edited

Loading

chadbailey59 commented May 16, 2024

kwindla commented May 17, 2024

aconchillo commented May 20, 2024

Implement Google Gemini LLM service #145

Implement Google Gemini LLM service #145

Comments

kwindla commented May 16, 2024 • edited Loading

chadbailey59 commented May 16, 2024

kwindla commented May 17, 2024

aconchillo commented May 20, 2024

kwindla commented May 16, 2024 •

edited

Loading