You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug _create_docs_from_splits of the DocumentSplitter initializes a new document and then changes its meta data afterward. This means that the document's ID is created without taking into account the additional meta data. Documents that have the same content and only differ in page number will receive the same Document ID and thus might be unwittingly treated as duplicates in a later stage of the pipeline.
Describe the bug
_create_docs_from_splits
of theDocumentSplitter
initializes a new document and then changes its meta data afterward. This means that the document's ID is created without taking into account the additional meta data. Documents that have the same content and only differ in page number will receive the same Document ID and thus might be unwittingly treated as duplicates in a later stage of the pipeline.Instead of the current
we should change the code to
The text was updated successfully, but these errors were encountered: