-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for tgi multimodal models #531
Conversation
@julien-blanchon I reused your dropzone component for this feature 🤗 thanks a lot for making it! |
Nice ! I'm pretty hype by this PR btw 👀 |
Hey @nsarrazin I'm thinking of dropping Mathpix dependencies in my implementation of Convert PDF to Markdown inside Chat UI (#441).
Are you interested in this functionality on the huggingchat side? If so, how can we work together? Is this functionality included in your multimodal tgi implementation? |
Hey @julien-blanchon! I think as a rule of thumb it's good to decouple any dependencies (especially remote APIs) from the feature itself if possible (see web search for example where we support three different providers now), so that people can configure the pdf parsing that they want. Maybe have some kind of standardized interface that takes a pdf file and returns the extracted text, so that people can copy the method and implement their own versions in future PRs? And I think that could be a cool feature, maybe when this (#531) PR is merged we can have a look to see how to hook it up? This PR already adds support for passing files to the backend,so we could have a logic check that handles files differently based on mime type, like you mentioned. |
This is pretty much done and ready for review, I think I covered every edge case of the feature! 😄 |
.env.template
Outdated
] | ||
}, | ||
{ | ||
"name": "HuggingFaceM4/idefics-80b-instruct", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove this change to .env.template if we don't want IDEFICS in production for HuggingChat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @julien-c to confirm this
except the nits I left, looks very close to being merged 👍 |
Co-authored-by: Mishig <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We could add paste support and full-screen dropzone in another PR.
Going to remove IDEFICS from the prod template, add a readme note about using it and then merge this |
* wip: add support for tgi multimodal models * wip work on passing images to prompt * working idefics config! * rm allowed conv feature * lint * Add image resizing * fix ssr * add upload button * add delete button * misc formatting * lint * server file size check * optimistic update of images * retry with images * fix websearch button * lint * better error handling & max one image at a time * replace test image by blank one * disable loading on page change * Fix sharing of images * fix comments * Update filedropzone (huggingface#544) * Update src/lib/buildPrompt.ts Co-authored-by: Mishig <[email protected]> * small tweaks * Fix merge conflicts * lint * wildcard image mime type * fix lint and comment * added comments * added comment about file size * Readme update --------- Co-authored-by: Mishig <[email protected]> Co-authored-by: Victor Mustar <[email protected]>
Working for now but things that still need fixing:
multimodal:true
is not set in the config for the modelHow to test
npm run updateLocalEnv
npm run dev
Start a conversation with IDEFICS to see the new ui
Screenshots