You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The model currently returns a tokenizer separately from an image_processor, as seen e.g. here. The huggingface "preferred way" seems to be to use a multimodal processor that processes text and images at once (see here or the sample code for Idefics3 here). I have been having trouble with this because I am trying to use llava-more for structured text generation with the outlines package, which assumes a single multimodal processor rather than a separate tokenizer and image_processor object (see e.g. this line of code) (though Idefics3 currently doesn't seem to work either because of incompatible inputs to the processor).
The text was updated successfully, but these errors were encountered:
The model currently returns a tokenizer separately from an image_processor, as seen e.g. here. The huggingface "preferred way" seems to be to use a multimodal processor that processes text and images at once (see here or the sample code for Idefics3 here). I have been having trouble with this because I am trying to use llava-more for structured text generation with the outlines package, which assumes a single multimodal processor rather than a separate tokenizer and image_processor object (see e.g. this line of code) (though Idefics3 currently doesn't seem to work either because of incompatible inputs to the processor).
The text was updated successfully, but these errors were encountered: