-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama document embedder #400
Conversation
…st, test was incorrect
Seems like there are some linting issues. You can fix these via hatch lint with --fix parameter. See readme for more details. LMK if you need some help and thanks for this contribution @jmdevita |
@vblagoje Just pushed a recent version where everything seems to be working now. For some reason one of my commits wasn't configured to my account, so it's labeled with the CLA unsigned (even though it's still me). Even with an amendment to that branch's commit didn't change anything, so let me know if there's something else I need to do. Thanks! |
@jmdevita It's unfortunate that this commit from one of your other accounts got in somehow but you can edit that easily. In your local git repo do an interactive git rebase and edit the commit with something like |
157af9d
to
912fca5
Compare
@vblagoje thanks for your help there. Everything should be good now |
@jmdevita seems ok to me, have you played with this document embedder? Does it work ok in your particular use case? |
@vblagoje Yup, I use it for my pipeline that runs daily. I use the TikaDocumentConverter that processes hundreds of files, then I use the DocumentCleaner & DocumentSplitter in the docs and use the OllamaDocumentEmbedder and writer to put into my Qdrant Vector DB. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's 🚢 thanks for your contribution @jmdevita Keep them coming 👍
@dfokina can we please add a note in docs about adding Ollama Document Embedder, then we can tag and release a new ollama package |
@dfokina Not sure where to send this, but attached below is a write up: OllamaDocumentEmbedderOllamaDocumentEmbedder computes the embeddings of a list of Documents and stores the obtained vectors in the embedding field of each Document. It uses embedding models compatible with the Ollama Library. Although it should be noted that most of the pre-built models are not great for producing embeddings. The vectors computed by this component are necessary to perform embedding retrieval on a collection of Documents. At retrieval time, the vector that represents the query is compared with those of the Documents to find the most similar or relevant Documents. OverviewOllamaDocumentEmbedder should be used to embed a lit of Documents, for embedding a string only, you should use the OllamaTextEmbedder. The component does uses http://localhost:11434/api/embeddings as the default URL as most available setups (Mac/linux/docker) default to the port 11434. Compatible ModelsUnless specified otherwise while initializing this component, the default embedding model is "orca-mini". Any other models can be viewed by viewing the other pre-built models. To load your own custom model, follow these instructions from Ollama. InstructionsTo start using this integration with Haystack, install the package with: Embedding MetadataMost embedded metadata contains information about the model name and type. Optional arguments to pass to the Ollama generation endpoint, such as temperature, top_p, etc. The model used will automatically be appended as part of the document metadata. An example payload using the orca-mini model will look like: UsageOn its own:
In a Pipeline
|
Wow, thank you @jmdevita we'll take it from here. Much much appreciated and looking forward to your next contribution. Keep them coming! |
Added Ollama Document Embedder and correlated pytests. Referenced existing Ollama Text Embedder and pre-existing Document Embedders to maintain parity.
Came from this issue