-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM
: Add image recognition and generation support
#89
Conversation
LLM
: Add image recognition and generation support
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When starting I get this warning:
pyris-app | /usr/local/lib/python3.12/site-packages/pydantic/_internal/_config.py:334: UserWarning: Valid config keys have changed in V2:
pyris-app | * 'schema_extra' has been renamed to 'json_schema_extra'
pyris-app | warnings.warn(message, UserWarning)
I also tested this on pyris-test and TS5 and the tutor chat still works.
Please fix the warning and merge main, then this can get merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to solve the merge conflicts. We renamed the IrisMessage
to PyrisMessage
and also adjusted the llms accordingly. If you have question ask me anytime.
WalkthroughThe recent updates across the application focus on enhancing type safety, expanding functionality, and improving message handling components. These changes include stricter type hints, additions of new fields to data transfer objects, and advancements in image data support. Abstract classes and methods for image generation have been introduced to cater to diverse content handling requirements within the system. Changes
The changes reflect a comprehensive effort to improve type safety, extend functionality, and enhance content handling capabilities throughout the application, especially focusing on image data support and message conversions. Recent Review DetailsConfiguration used: CodeRabbit UI Files selected for processing (1)
Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the code and there were a couple of issues (Ollama did not work, only the first content of a message was used, ...). I already fixed most of them in 001b99d
Though I do have some more smaller questions.
|
||
|
||
class ImageMessageContentDTO(BaseModel): | ||
image_data: Optional[str] = Field(alias="imageData", default=None) | ||
base64: List[str] = Field(..., alias="base64") # List of base64-encoded strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- The alias is redundant
- What is the
...
for? - What is the reason to have this as a list? If you want more than one image in a message you could simply add more than one ImageMessageContent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can have one message with multiple images. That's how it works in the openAI API call at least, so I think it makes more sense to have the list there.
Other that that I will erase the alias, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You actually do not use this for the OpenAI API in any way. The only place where this would make sense is the Dalle model, as that returns multiple images, but even there you simple create separate ImageMessageContentDTO
.
As it currently stands you could simply make this a single string and it would not really change anything behavior wise and would make things easier to handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope this would be used for image interpretation(This can be useful for example for future interpretation of the lecture, if you have multiple pages where they have diagrams and you want them to be interpreted together)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but you could simply use multiple message contents for that. That way (at least with GPT4V) you also can have text in between the images, instead of just having a bunch of images after one another.
And from a conceptual standpoint each message content should only include a single datapoint (e.g. a single text block, a single json object, and a single image)
Also from an Artemis database standpoint it would make more sense to have a separate content for each image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think both versions make sense but let's do it in a simple text as it does not hinder the work for now.
6cf2f4e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code
iris_images.append( | ||
ImageMessageContentDTO( | ||
prompt=revised_prompt, | ||
base64=base64_data, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work, because ImageMessageContentDTO.base64
is an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The black linter is failing
6fd7d26
Co-authored-by: Timor Morrien <[email protected]>
Motivation
In the future we want to support images in our LLM subsystem. This PR aims to introduce support for this.
Description
To add support for Images we added a new domain model PyrisImage to represent images in a unified way.
We also added a basic wrapper for OpenAIs Dall-E to showcase how images can be created.
We also added support to pass images to Ollama models via normal and chat completion.
We also added support for OpenAI GPT4-Vision models
TEST Instructions
Copy the code provided in this Description in the Tutor chat pipeline class,
Add a vision model if you don't have one to your config and add it to llm_vision initialisation
Choose an image and call interpret_image(file_path) with the path to your image
run Artemis and Pyris v2, ask iris anything and wait for the interpretation of the image to be printed on the console of Pyris
def image_to_base64(file_path):
with open(file_path, "rb") as image_file:
encoded_string = base64.b64encode(image_file.read()).decode('utf-8')
return encoded_string
def interpret_image(img_base64: str ):
"""
Interpret the image passed
"""
img_base64 = image_to_base64(file_path)
image_interpretation_prompt = "Interpret the image passed"
image =ImageMessageContentDTO(base64=[img_base64], prompt=image_interpretation_prompt)
iris_message = PyrisMessage(
sender=IrisMessageRole.SYSTEM,
contents=[image]
)
llm_vision = BasicRequestHandler("")
response = llm_vision.chat(
[iris_message], CompletionArguments(temperature=0.2, max_tokens=1000)
)
print(response.contents[0].text_content)
return response.contents[0].text_content