langchain: fix EmbeddingsFilter compress_documents return type to Sequence[Document] instead of Sequence[_DocumentWithState] #17946

maximeperrindev · 2024-02-22T12:26:53Z

Description: This PR solves a typing problem encountered using EmbeddingsFilter.compress_documents method with langchain. The returned Sequence of _DocumentWithState was not meeting the typing expectation of Sequence[Document]. This could cause a problem of json parsing because of embeddings field in _DocumentWithState.
Issue: TypeError: Type is not JSON serializable: numpy.float64 #17875
Twitter handle: @maximeperrin_

vercel · 2024-02-22T12:26:57Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)	Visit Preview		Feb 22, 2024 0:27am

maximeperrindev · 2024-03-08T09:46:06Z

@eyurtsev

baskaryan · 2024-03-29T00:12:20Z

this is actually intentional, it's needed that this returns a stateful document so that embeddings aren't recomputed multiple times in a compression pipeline (where multiple compressors use embeddings). the correct solution would be to convert any stateful docs to non-stateful docs outside of this class.

closing but let me know if i'm missing something

hpx502766238 · 2024-08-15T03:23:10Z

this is actually intentional, it's needed that this returns a stateful document so that embeddings aren't recomputed multiple times in a compression pipeline (where multiple compressors use embeddings). the correct solution would be to convert any stateful docs to non-stateful docs outside of this class.

closing but let me know if i'm missing something

Excuse me.Can you tell me how to convert any stateful docs to non-stateful docs?

hpx502766238 · 2024-08-15T07:13:08Z

I have found the solution.
We can convert (_DocumentWithState) to (Document) before an ContextualCompressionRetriever return.
in langchain/retrievers/contextual_compression.py, method _get_relevant_documents and async def _aget_relevant_documents,I changed the code like following:
docs = await self.base_retriever.ainvoke(
query, config={"callbacks": run_manager.get_child()}, **kwargs
)
if docs:
compressed_docs = await self.base_compressor.acompress_documents(
docs, query, callbacks=run_manager.get_child()
)
#convert _DocumentWithState
compressed_docs_converted = [
doc.to_document() if isinstance(doc, _DocumentWithState) else doc
for doc in compressed_docs
]
return compressed_docs_converted
else:
return []

Returning Sequence[Document] instead of Sequence[_DocumentWithState]

edb60cd

dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. Ɑ: embeddings Related to text embedding models module 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Feb 22, 2024

baskaryan closed this Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

langchain: fix EmbeddingsFilter compress_documents return type to Sequence[Document] instead of Sequence[_DocumentWithState] #17946

langchain: fix EmbeddingsFilter compress_documents return type to Sequence[Document] instead of Sequence[_DocumentWithState] #17946

maximeperrindev commented Feb 22, 2024 •

edited

Loading

vercel bot commented Feb 22, 2024 •

edited

Loading

maximeperrindev commented Mar 8, 2024

baskaryan commented Mar 29, 2024

hpx502766238 commented Aug 15, 2024

hpx502766238 commented Aug 15, 2024 •

edited

Loading

langchain: fix EmbeddingsFilter compress_documents return type to Sequence[Document] instead of Sequence[_DocumentWithState] #17946

langchain: fix EmbeddingsFilter compress_documents return type to Sequence[Document] instead of Sequence[_DocumentWithState] #17946

Conversation

maximeperrindev commented Feb 22, 2024 • edited Loading

vercel bot commented Feb 22, 2024 • edited Loading

maximeperrindev commented Mar 8, 2024

baskaryan commented Mar 29, 2024

hpx502766238 commented Aug 15, 2024

hpx502766238 commented Aug 15, 2024 • edited Loading

maximeperrindev commented Feb 22, 2024 •

edited

Loading

vercel bot commented Feb 22, 2024 •

edited

Loading

hpx502766238 commented Aug 15, 2024 •

edited

Loading