Enable use of embeddings from Vision Language models #3452
Labels
area: backend
Related to backend functionality or under the /backend directory
enhancement
New feature or request
rag: ingestion
rag: retrieval
Currently, we use text embeddings. This is fine for textual documents, while it present obvious drawbacks for documents containing non-textual content (images, graphs, schemes, …).
An alternative, is to use Visual Language models such as ColPali (see also https://huggingface.co/blog/manu/colpali, https://danielvanstrien.xyz/posts/post-with-code/colpali-qdrant/2024-10-02_using_colpali_with_qdrant.html, https://blog.vespa.ai/retrieval-with-vision-language-models-colpali/, https://blog.vespa.ai/scaling-colpali-to-billions/)
The text was updated successfully, but these errors were encountered: