Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinecone: dummy vector is not compatible with the new API #6931

Closed
anakin87 opened this issue Feb 7, 2024 Discussed in #6929 · 2 comments
Closed

Pinecone: dummy vector is not compatible with the new API #6931

anakin87 opened this issue Feb 7, 2024 Discussed in #6929 · 2 comments
Assignees
Labels
1.x type:bug Something isn't working

Comments

@anakin87
Copy link
Member

anakin87 commented Feb 7, 2024

Discussed in #6929

Originally posted by Boltzmann08 February 6, 2024
Hello everyone,

Am trying to upsert data to pinecone. First i convert and preprocess them. But once i want to write thoses preprocessed data, i got an api error.

I am running this on colab.

`
!pip install farm-haystack[all]
!pip install datasets

#import all the necessary libraries

doc_dir = "data/tutorial8"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/preprocessing_tutorial8.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

all_docs = convert_files_to_docs(dir_path=doc_dir)

preprocessor = PreProcessor(
clean_empty_lines=True,
clean_whitespace=True,
split_by="word",
split_length=100,
split_respect_sentence_boundary=True
)

docs_default = preprocessor.process(all_docs) #create a dictionary with the data in the 'content' key

document_store.write_documents(docs_default) #need a dictionary as arg`

The error message is :
ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({‘content-type’: ‘application/json’, ‘Content-Length’: ‘155’, ‘x-pinecone-request-latency-ms’: ‘136’, ‘date’: ‘Wed, 31 Jan 2024 13:33:43 GMT’, ‘x-envoy-upstream-service-time’: ‘32’, ‘server’: ‘envoy’, ‘Via’: ‘1.1 google’, ‘Alt-Svc’: ‘h3=“:443”; ma=2592000,h3-29=“:443”; ma=2592000’})
HTTP response body: {“code”:3,“message”:“Dense vectors must contain at least one non-zero value. Vector ID 1f6ca8a2bd6c9903813607120d8d48bc contains only zeros.”,“details”:}

when i do this :
`from pprint import pprint

pprint(docs_default[0])`

it's return this :
<Document: {‘content’: 'BERT: Pre-training of Deep Bidirectional Transformers for\nLanguage Understanding\nJacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova\nGoogle AI Languagen{jacobdevlin,mingweichang,kentonl,kristout}@google.com\nAbstract\nWe introduce a new language representa-\ntion model called BERT, which stands for\nBidirectional Encoder Representations from\nTransformers. Unlike recent language repre-\nsentation models (Peters et al., 2018a; Rad-\nford et al., 2018), BERT is designed to pre-\ntrain deep bidirectional representations from\nunlabeled text by jointly conditioning on both\nleft and right context in all layers. ', ‘content_type’: ‘text’, ‘score’: None, ‘meta’: {‘name’: ‘bert.pdf’, ‘_split_id’: 0}, ‘id_hash_keys’: [‘content’], ‘embedding’: None, ‘id’: ‘1f6ca8a2bd6c9903813607120d8d48bc’}>

I found a solution which consists on creating the embeddings and upsert them direclty to the document store in pinecone without using haystack.
But it's too bad to not use all what haystack can provide. Also in this case the retriever is unable tu update the embeddings once connected to the document store. Because the index is still empty for him.
After reflexion and searching its seems that pinecone do not handle this : """"‘embedding’: None""" in the metedata field. But this how PreProcessor gaves its returns.

Anyone did encounter this issues ?

@anakin87 anakin87 added 1.x type:bug Something isn't working labels Feb 7, 2024
@anakin87
Copy link
Member Author

anakin87 commented Feb 7, 2024

Seems the same problem emerged in deepset-ai/haystack-core-integrations#300

@anakin87
Copy link
Member Author

anakin87 commented Feb 7, 2024

fixed in #6932

@anakin87 anakin87 closed this as completed Feb 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.x type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant