Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ollama multimodal llm support for image with prompt #14811

Closed

Conversation

DohOnGit
Copy link

  • Description: Adds a class OllamaMultiModal to the Ollama llms file: lanchain/llms/ollama.py. This class supports sending an image to the ollama endpoint for models that support the image with a prompt message.
  • Twitter handle: Daniel_OHeron

Passes make format, make lint, make test.

If this code change is useful. Will add unit test, notebook and documentation. As well as build out additional features. Please let me know.

Example usage:
`
from OllamaMultiModalMain import OllamaMultiModal
import base64

def encode_image_to_base64(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')

llm = OllamaMultiModal(model="bakllava")

while True:
user_input = input("User (text): ")

image_input = input("User (image path, press enter to skip): ")
image_data = encode_image_to_base64(image_input) if image_input else None

if image_data:
    response = llm.invoke(
        input=user_input, images=[image_data])
else:
    response = llm.invoke(input=user_input)
    # response = conversational_chain.predict(input=user_input)

print("Chat:", response)

`

@dosubot dosubot bot added the size:M This PR changes 30-99 lines, ignoring generated files. label Dec 17, 2023
Copy link

vercel bot commented Dec 17, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Dec 18, 2023 6:45am

@dosubot dosubot bot added Ɑ: models Related to LLMs or chat model modules 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features labels Dec 17, 2023
@DohOnGit DohOnGit force-pushed the feature/ollama-multi-modal branch from 28c2b8f to 3cbc34c Compare December 18, 2023 06:45
@DohOnGit
Copy link
Author

The last push was the rebased feature branch with all the latest additions added to langchain since pull request started. Not just the ones I made. Probably should not rebase my feature branch to the langchain master branch during a pull request going forward. Apologies .

@jacoblee93
Copy link
Contributor

jacoblee93 commented Dec 19, 2023

Hey @DanielOHeron, thanks for the PR!

I've added support for this via a bound image param:

https://python.langchain.com/docs/integrations/llms/ollama#multi-modal

I think that is nicer for now since we don't need to create a second class. Going to close this for now.

@jacoblee93 jacoblee93 closed this Dec 19, 2023
@DohOnGit
Copy link
Author

Hey @jacoblee93, This works great!

Thanks for solving this. Just had to install langchain-community for the latest Ollama integrations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features Ɑ: models Related to LLMs or chat model modules size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants