-
Notifications
You must be signed in to change notification settings - Fork 15.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a ListRerank
document compressor
#13311
Add a ListRerank
document compressor
#13311
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
type="array[dict]", | ||
) | ||
] | ||
output_parser = StructuredOutputParser.from_response_schemas(response_schemas) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with a Pydantic parser that defines the full nested structure explicitly and saw notably more output parsing errors. Expressing the array[dict]
type as an implicit nested type within a single ResponseSchema
type
argument was much more successful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems really cool! Would be extra good to add a notebook in documentation as an example for this
Thanks @hwchase17. I fixed the lint issues and added to the documentation, it LGTM in the Vercel preview: This is ready to go from my perspective. |
…7/langchain into feature/list-rerank-compressor
I fixed the order of arguments (need arg first, not kwarg) and fixed the bad type in the Callable arg as noted by the failed check. |
needs a re-review and clean up, but i generally like it |
- **Description:** This PR adds a new document compressor called `ListRerank`. It's derived from `BaseDocumentCompressor`. It's a near exact implementation of introduced by this paper: [Zero-Shot Listwise Document Reranking with a Large Language Model](https://arxiv.org/pdf/2305.02156.pdf) which it finds to outperform pointwise reranking, which is somewhat implemented in LangChain as [LLMChainFilter](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/document_compressors/chain_filter.py). - **Issue:** None - **Dependencies:** None - **Tag maintainer:** @hwchase17 @izzymsft - **Twitter handle:** @HarrisEMitchell Notes: 1. I didn't add anything to `docs`. I wasn't exactly sure which patterns to follow as [cohere reranker is under Retrievers](https://python.langchain.com/docs/integrations/retrievers/cohere-reranker) with other external document retrieval integrations, but other contextual compression is [here](https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression/). Happy to contribute to either with some direction. 2. I followed syntax, docstrings, implementation patterns, etc. as well as I could looking at nearby modules. One thing I didn't do was put the default prompt in a separate `.py` file like [Chain Filter](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/document_compressors/chain_filter_prompt.py) and [Chain Extract](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/document_compressors/chain_extract_prompt.py). Happy to follow that pattern if it would be preferred. --------- Co-authored-by: Harrison Chase <[email protected]> Co-authored-by: Bagatur <[email protected]> Co-authored-by: Chester Curme <[email protected]>
ListRerank
. It's derived fromBaseDocumentCompressor
. It's a near exact implementation of introduced by this paper: Zero-Shot Listwise Document Reranking with a Large Language Model which it finds to outperform pointwise reranking, which is somewhat implemented in LangChain as LLMChainFilter.Notes:
docs
. I wasn't exactly sure which patterns to follow as cohere reranker is under Retrievers with other external document retrieval integrations, but other contextual compression is here. Happy to contribute to either with some direction..py
file like Chain Filter and Chain Extract. Happy to follow that pattern if it would be preferred.