Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a ListRerank document compressor #13311

Merged
merged 26 commits into from
Jul 18, 2024

Conversation

denver1117
Copy link
Contributor

@denver1117 denver1117 commented Nov 13, 2023

Notes:

  1. I didn't add anything to docs. I wasn't exactly sure which patterns to follow as cohere reranker is under Retrievers with other external document retrieval integrations, but other contextual compression is here. Happy to contribute to either with some direction.
  2. I followed syntax, docstrings, implementation patterns, etc. as well as I could looking at nearby modules. One thing I didn't do was put the default prompt in a separate .py file like Chain Filter and Chain Extract. Happy to follow that pattern if it would be preferred.

Copy link

vercel bot commented Nov 13, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 18, 2024 8:31pm

@dosubot dosubot bot added the 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Nov 13, 2023
type="array[dict]",
)
]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented with a Pydantic parser that defines the full nested structure explicitly and saw notably more output parsing errors. Expressing the array[dict] type as an implicit nested type within a single ResponseSchema type argument was much more successful.

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 24, 2023
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems really cool! Would be extra good to add a notebook in documentation as an example for this

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Nov 29, 2023
@dosubot dosubot bot removed the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Nov 30, 2023
@denver1117
Copy link
Contributor Author

denver1117 commented Nov 30, 2023

this seems really cool! Would be extra good to add a notebook in documentation as an example for this

Thanks @hwchase17. I fixed the lint issues and added to the documentation, it LGTM in the Vercel preview:

Screen Shot 2023-11-30 at 8 48 45 AM

This is ready to go from my perspective.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Dec 5, 2023
@denver1117
Copy link
Contributor Author

I fixed the order of arguments (need arg first, not kwarg) and fixed the bad type in the Callable arg as noted by the failed check.

@baskaryan baskaryan added the needs documentation PR needs to be updated with documentation label Apr 1, 2024
@ccurme ccurme added the langchain Related to the langchain package label Jun 21, 2024
@hwchase17
Copy link
Contributor

needs a re-review and clean up, but i generally like it

@ccurme ccurme enabled auto-merge (squash) July 18, 2024 20:23
@ccurme ccurme merged commit 61ea7bf into langchain-ai:master Jul 18, 2024
55 checks passed
olgamurraft pushed a commit to olgamurraft/langchain that referenced this pull request Aug 16, 2024
- **Description:** This PR adds a new document compressor called
`ListRerank`. It's derived from `BaseDocumentCompressor`. It's a near
exact implementation of introduced by this paper: [Zero-Shot Listwise
Document Reranking with a Large Language
Model](https://arxiv.org/pdf/2305.02156.pdf) which it finds to
outperform pointwise reranking, which is somewhat implemented in
LangChain as
[LLMChainFilter](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/document_compressors/chain_filter.py).
- **Issue:** None
- **Dependencies:** None
- **Tag maintainer:** @hwchase17 @izzymsft
- **Twitter handle:** @HarrisEMitchell

Notes:
1. I didn't add anything to `docs`. I wasn't exactly sure which patterns
to follow as [cohere reranker is under
Retrievers](https://python.langchain.com/docs/integrations/retrievers/cohere-reranker)
with other external document retrieval integrations, but other
contextual compression is
[here](https://python.langchain.com/docs/modules/data_connection/retrievers/contextual_compression/).
Happy to contribute to either with some direction.
2. I followed syntax, docstrings, implementation patterns, etc. as well
as I could looking at nearby modules. One thing I didn't do was put the
default prompt in a separate `.py` file like [Chain
Filter](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/document_compressors/chain_filter_prompt.py)
and [Chain
Extract](https://github.com/langchain-ai/langchain/blob/master/libs/langchain/langchain/retrievers/document_compressors/chain_extract_prompt.py).
Happy to follow that pattern if it would be preferred.

---------

Co-authored-by: Harrison Chase <[email protected]>
Co-authored-by: Bagatur <[email protected]>
Co-authored-by: Chester Curme <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features langchain Related to the langchain package lgtm PR looks good. Use to confirm that a PR is ready for merging. needs documentation PR needs to be updated with documentation size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants