Add support for tgi multimodal models #531

nsarrazin · 2023-10-24T19:13:06Z

Working for now but things that still need fixing:

How to test

pull this pr
npm run updateLocalEnv
npm run dev

Start a conversation with IDEFICS to see the new ui

Screenshots

nsarrazin · 2023-10-25T10:54:59Z

@julien-blanchon I reused your dropzone component for this feature 🤗 thanks a lot for making it!

julien-blanchon · 2023-10-25T11:07:18Z

Nice ! I'm pretty hype by this PR btw 👀

julien-blanchon · 2023-10-26T15:26:30Z

Hey @nsarrazin I'm thinking of dropping Mathpix dependencies in my implementation of Convert PDF to Markdown inside Chat UI (#441).
And include two text extractors:

A basic text extractor that extracts pure text from the PDF
A more advanced text extractor that uses an advanced OCR like https://huggingface.co/facebook/nougat-base, and uses the hosted inference API with hf user provided token

Are you interested in this functionality on the huggingchat side?

If so, how can we work together? Is this functionality included in your multimodal tgi implementation?
We could refactor the code a bit to enable the use of multiple file types and multiple "agents", what are your plans in this regard?

nsarrazin · 2023-10-27T09:52:10Z

Hey @julien-blanchon!

I think as a rule of thumb it's good to decouple any dependencies (especially remote APIs) from the feature itself if possible (see web search for example where we support three different providers now), so that people can configure the pdf parsing that they want. Maybe have some kind of standardized interface that takes a pdf file and returns the extracted text, so that people can copy the method and implement their own versions in future PRs?

And I think that could be a cool feature, maybe when this (#531) PR is merged we can have a look to see how to hook it up? This PR already adds support for passing files to the backend,so we could have a logic check that handles files differently based on mime type, like you mentioned.

nsarrazin · 2023-11-02T09:04:47Z

This is pretty much done and ready for review, I think I covered every edge case of the feature! 😄

nsarrazin · 2023-11-02T09:17:47Z

.env.template

+      ]
+    },
+        {
+      "name": "HuggingFaceM4/idefics-80b-instruct",


We can remove this change to .env.template if we don't want IDEFICS in production for HuggingChat

cc @julien-c to confirm this

src/lib/components/chat/ChatMessage.svelte

src/lib/components/chat/ChatWindow.svelte

src/lib/components/chat/FileDropzone.svelte

src/lib/types/Message.ts

src/lib/buildPrompt.ts

mishig25 · 2023-11-03T11:50:53Z

except the nits I left, looks very close to being merged 👍

Co-authored-by: Mishig <[email protected]>

gary149

LGTM. We could add paste support and full-screen dropzone in another PR.

src/routes/conversation/[id]/+page.svelte

nsarrazin · 2023-11-16T10:45:12Z

Going to remove IDEFICS from the prod template, add a readme note about using it and then merge this

* wip: add support for tgi multimodal models * wip work on passing images to prompt * working idefics config! * rm allowed conv feature * lint * Add image resizing * fix ssr * add upload button * add delete button * misc formatting * lint * server file size check * optimistic update of images * retry with images * fix websearch button * lint * better error handling & max one image at a time * replace test image by blank one * disable loading on page change * Fix sharing of images * fix comments * Update filedropzone (huggingface#544) * Update src/lib/buildPrompt.ts Co-authored-by: Mishig <[email protected]> * small tweaks * Fix merge conflicts * lint * wildcard image mime type * fix lint and comment * added comments * added comment about file size * Readme update --------- Co-authored-by: Mishig <[email protected]> Co-authored-by: Victor Mustar <[email protected]>

wip: add support for tgi multimodal models

b19adc6

nsarrazin marked this pull request as draft October 24, 2023 19:13

nsarrazin added 3 commits October 25, 2023 10:41

wip work on passing images to prompt

248dfee

working idefics config!

d11182a

rm allowed conv feature

06c38d2

nsarrazin added enhancement New feature or request front This issue is related to the front-end of the app. back This issue is related to the Svelte backend or the DB models This issue is related to model performance/reliability labels Oct 25, 2023

nsarrazin added 13 commits October 25, 2023 13:51

Merge branch 'main' into feature/idefics

246ebbc

lint

b2b5743

Add image resizing

1386943

fix ssr

a274ab9

add upload button

7112d2b

add delete button

5e2018e

misc formatting

8ab99a8

lint

ff1a74e

server file size check

37876b9

optimistic update of images

9eb800f

retry with images

fe0cc90

fix websearch button

b7b17bc

lint

c3bfd1c

Merge branch 'main' into feature/idefics

e7958bb

nsarrazin added 3 commits October 30, 2023 10:48

better error handling & max one image at a time

04f241e

replace test image by blank one

354808b

disable loading on page change

6f1c9c2

fix comments

db51668

nsarrazin commented Nov 2, 2023

View reviewed changes

nsarrazin mentioned this pull request Nov 2, 2023

/generate for image generation #464

Closed

nsarrazin requested a review from mishig25 November 3, 2023 09:27