-
Notifications
You must be signed in to change notification settings - Fork 439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial commit of Google Gemini LLM service. #150
Conversation
macos-py3.10-requirements.txt
Outdated
# | ||
# This file is autogenerated by pip-compile with Python 3.10 | ||
# This file is autogenerated by pip-compile with Python 3.11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I try to generate this with python 3.10 just to make sure it works there. on macos i do:
brew install [email protected]
python3.10 -m venv venv
...
...
pip-compile --all-extras pyproject.toml
mv requirements.txt macos-py3.10-requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
src/pipecat/services/openai.py
Outdated
del message["mime_type"] | ||
|
||
# messages_for_log = json.dumps(messages) | ||
# logger.debug(f"Generating chat: {messages_for_log}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove or re-add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
Gemini text input works. We translate from OpenAILLMContext format on the fly in the GoogleLLMService implementation. This commit also implements image input (vision) in both the GoogleLLMService and in the OpenAILLMService. Image input is a hack and needs to be revisited. OpenAI expects images to be uploaded as base64-encoded JPEGs. Google does not require the base64 encoding. Other than for images, we use the OpenAI format as our standard, but base64-encoding the images and then unencoding them in the GoogleLLMService feels wasteful.
Gemini text input works. We translate from OpenAILLMContext format on the fly in the GoogleLLMService implementation.
This commit also implements image input (vision) in both the GoogleLLMService and in the OpenAILLMService. Image input is a hack and needs to be revisited. OpenAI expects images to be uploaded as base64-encoded JPEGs. Google does not require the base64 encoding. Other than for images, we use the OpenAI format as our standard, but base64-encoding the images and then unencoding them in the GoogleLLMService feels wasteful.