Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChromaQueryTextRetriever InvalidDimensionException #698

Closed
1 task done
jongirard opened this issue Apr 29, 2024 · 1 comment
Closed
1 task done

ChromaQueryTextRetriever InvalidDimensionException #698

jongirard opened this issue Apr 29, 2024 · 1 comment

Comments

@jongirard
Copy link
Contributor

jongirard commented Apr 29, 2024

Describe the bug
I'm attempting to utilize the ChromaQueryTextRetriever following this example and encountering an error of chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 768.

Error message
chromadb.errors.InvalidDimensionException: Embedding dimension 384 does not match collection dimensionality 768.

Expected behavior
Expected to receive known output variable “documents”: a list of Documents.

Additional context
My documents have been embedded using OllamaDocumentEmbedder via the nomic-embed-text model. With my still developing sense of what's going on (new to this), I believe this error may be due to the default chromadb embedding_function utilizing all-MiniLM-L6-v2 as mentioned here: https://docs.trychroma.com/embeddings#default-all-minilm-l6-v2 not matching the model used to generate the embeddings.

However, because chroma-haystack (0.15.0) depends on chromadb (<0.4.20) OllamaEmbeddingFunction is not available yet, (https://github.com/chroma-core/chroma/blob/0.5.0/chromadb/utils/embedding_functions.py#L966) only just added in chromadb 0.5.0.

Describe the solution you'd like

I believe removing the pin on chomadb <0.4.20 and allowing latest v0.5.0 (which introduced OllamaEmbeddingFunction) will resolve the issue.

To Reproduce

  1. Embed documents utilizing OllamaDocumentEmbedder(model="nomic-embed-text") and persist to chroma document store.
  2. Attempt to utilize ChromaQueryTextRetriever(document_store)
  3. Receive error chromadb.errors.InvalidDimensionException.

FAQ Check

System:

  • OS: 13.5.2
  • GPU/CPU: M1 Max
  • Haystack version (commit or version number): 2.0
  • DocumentStore: chromadb
  • Reader: N/A
  • Retriever: ChromaQueryTextRetriever
@anakin87 anakin87 transferred this issue from deepset-ai/haystack Apr 29, 2024
@jongirard
Copy link
Contributor Author

Created a PR here to bump the chromadb version which I believe will fix the issue by allowing the use of OllamaEmbeddingFunction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants