Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'blob_data' when running Pipeline.run() using costume collection settings in WeaviateDocumentStore #565

Closed
Poaceae1999 opened this issue Mar 7, 2024 · 1 comment
Labels
bug Something isn't working integration:weaviate

Comments

@Poaceae1999
Copy link

Poaceae1999 commented Mar 7, 2024

Describe the bug
When running the Pipeline().run method, a KeyError: 'blob_data' is raised. This error originates from the WeaviateDocumentStore class in the haystack_integrations package.

Error message

#   File "C:\RAG\issue_1.py", line 68, in <module>
#     # router for different pipeline
#              ^^^^^^^^^^^^^^^^^^^^^^^
#   File "c:\Users\ttim3\AppData\Local\Programs\Python\Python312\Lib\site-packages\haystack\core\pipeline\pipeline.py", line 771, in run
#     res = comp.run(**last_inputs[name])
#           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "c:\Users\ttim3\AppData\Local\Programs\Python\Python312\Lib\site-packages\haystack_integrations\components\retrievers\weaviate\embedding_retriever.py", line 74, in run
#     documents = self._document_store._embedding_retrieval(
#                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
#   File "c:\Users\ttim3\AppData\Local\Programs\Python\Python312\Lib\site-packages\haystack_integrations\document_stores\weaviate\document_store.py", line 470, in _embedding_retrieval
#     return [self._to_document(doc) for doc in result["data"]["Get"][collection_name]]
#             ^^^^^^^^^^^^^^^^^^^^^^
#   File "c:\Users\ttim3\AppData\Local\Programs\Python\Python312\Lib\site-packages\haystack_integrations\document_stores\weaviate\document_store.py", line 232, in _to_document
#     data.pop("blob_data")
# KeyError: 'blob_data'

Expected behavior
Get query result.

Additional context
The script initializes a WeaviateDocumentStore, sets up documents with embeddings, and adds them to the document store. It then sets up a query pipeline with a text embedder and a WeaviateEmbeddingRetriever. When running a query through this pipeline, the error occurs.

It seems that the WeaviateDocumentStore is trying to remove the "blob_data" key from a dictionary, but this key does not exist in the dictionary. Due to the custom collection settings not containing the property "blob_data".
I did not see a requirement to add "blob_data" in the Weaviate or Haystack documentation.

To Reproduce

from haystack import Document
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
from haystack import Pipeline
from haystack_integrations.document_stores.weaviate.document_store import (
    WeaviateDocumentStore,
)
from haystack.components.embedders import (
    SentenceTransformersTextEmbedder,
    SentenceTransformersDocumentEmbedder,
)
from haystack_integrations.components.retrievers.weaviate.embedding_retriever import (
    WeaviateEmbeddingRetriever,
)

# initialize weaviate----
document_store = WeaviateDocumentStore(
    url="http://localhost:8080",
    collection_settings={
        "class": "Article",
        "properties": [
            {
                "name": "title",
                "dataType": ["text"],
            },
            {
                "name": "abstract",
                "dataType": ["text"],
            },
        ],
        "vectorizer": "none",
    },
)
# set up documents----
documents = [
    Document(content="This is first", meta={"title": "hello", "abstract": "hello"}),
    Document(content="This is second", meta={"name": "second"}),
]

text_embedder = SentenceTransformersTextEmbedder()
document_embedder = SentenceTransformersDocumentEmbedder()
document_embedder.warm_up()
documents_with_embeddings = document_embedder.run(documents)["documents"]
document_store.write_documents(documents_with_embeddings)

# set up query pipeline----
query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
query_pipeline.add_component(
    "retriever", WeaviateEmbeddingRetriever(document_store=document_store)
)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

# query, retrieve and print result----
query = "Who lives in Berlin?"
result = query_pipeline.run({"text_embedder": {"text": query}})

System:

  • OS: Windows 11 23H2
  • GPU/CPU: RTX 4080 SUPER/Intel(R) Core(TM) i7-14700K
  • Haystack version (commit or version number): v2.0.0-beta.8
  • DocumentStore: WeaviateDocumentStore
  • Reader: None
  • Retriever: WeaviateEmbeddingRetriever
@anakin87 anakin87 transferred this issue from deepset-ai/haystack-integrations Mar 11, 2024
@anakin87 anakin87 added bug Something isn't working integration:weaviate labels Mar 11, 2024
@silvanocerza
Copy link
Contributor

This has been solved with #463.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working integration:weaviate
Projects
None yet
Development

No branches or pull requests

3 participants