You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Cannot upload docx file to Milvus database because of DOCXMetadata
Error message TypeError: 'DOCXMetadata' object is not subscriptable
Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.
To Reproduce
Use DOCX pipeline with Milvus as vectordb
So i already fix this issue at the time i post this, the issue is about the DOCXMetadata cannot be indexed, and after knowing the issue i try to pop(delete) the metadata and it works fine.
[Document(id=841f2916f4d4fe3612dac9490fc3d4ceb78ba76a2f78627413e0f5bcded1a206, content: 'Sample Docx File
The US has "passed the peak" on new coronavirus cases, President Donald Trump said...', meta: {'file_path': 'sample_docx.docx', 'docx': DOCXMetadata(author='Saha, Anirban', category='', comments='', content_status='', created='2020-07-14T08:14:00+00:00', identifier='', keywords='', language='', last_modified_by='Saha, Anirban', last_printed=None, modified='2020-07-14T08:16:00+00:00', revision=1, subject='', title='', version='')})]
Traceback (most recent call last):
File "/home/anakin87/apps/experiments/milvusdocx/try.py", line 18, in <module>
document_store.write_documents(docs)
File "/home/anakin87/apps/experiments/milvusdocx/.venv/lib/python3.10/site-packages/milvus_haystack/document_store.py", line 336, in write_documents
documents_cp = [MilvusDocumentStore._discard_invalid_meta(doc) for doc in deepcopy(documents)]
File "/home/anakin87/apps/experiments/milvusdocx/.venv/lib/python3.10/site-packages/milvus_haystack/document_store.py", line 336, in <listcomp>
documents_cp = [MilvusDocumentStore._discard_invalid_meta(doc) for doc in deepcopy(documents)]
File "/home/anakin87/apps/experiments/milvusdocx/.venv/lib/python3.10/site-packages/milvus_haystack/document_store.py", line 952, in _discard_invalid_meta
dtype = infer_dtype_bydata(value)
File "/home/anakin87/apps/experiments/milvusdocx/.venv/lib/python3.10/site-packages/pymilvus/orm/types.py", line 130, in infer_dtype_bydata
elem = data[0]
TypeError: 'DOCXMetadata' object is not subscriptable
As @saikanov was suggesting, the issue is related to the DOCXMetadata dataclass being included in meta.
I want to investigate the impact of this aspect for other document stores.
Describe the bug
Cannot upload docx file to Milvus database because of DOCXMetadata
Error message
TypeError: 'DOCXMetadata' object is not subscriptable
Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.
To Reproduce
Use DOCX pipeline with Milvus as vectordb
So i already fix this issue at the time i post this, the issue is about the DOCXMetadata cannot be indexed, and after knowing the issue i try to pop(delete) the metadata and it works fine.
after that i go to [haystack/components/converters/docx.py ](https://github.com/deepset-ai/haystack/blob/main/haystack/components/converters/docx.py)
and edit the merged_metadata variable so it not include the DOCXMetadata
merged_metadata = {**bytestream.meta, **metadata}
and now it work with Pipeline
The thing i want to ask is, what is DOCXMetadata do? does it only error on milvus? and is it fine to not include it to resolve my issue?
Thanks!
The text was updated successfully, but these errors were encountered: