Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: MetaField Ranker #6189

Merged
merged 19 commits into from
Nov 9, 2023
Merged

Conversation

domenicocinque
Copy link
Contributor

@domenicocinque domenicocinque commented Oct 29, 2023

Related Issues

Proposed Changes:

Why:

To allow users to rank documents by a relevant metadata field after having used a retriever.

How can it be used:

from haystack.preview.components.rankers.meta_field import MetaFieldRanker
from haystack.preview.dataclasses import Document 

# Documents coming from a retriever 
documents = [
    Document(content="Product 1", meta={"rating": 1.3}, score=0.3),
    Document(content="Product 2", meta={"rating": 0.7}, score=0.4),
    Document(content="Product 3", meta={"rating": 2.1}, score=0.6),
]

ranker = MetaFieldRanker(
    metadata_field="rating",
    ranking_mode="reciprocal_rank_fusion", 
    weight=0.5
)

sorted_documents = ranker.run(query="", documents=documents)
print(sorted_documents)

The example shows how the component can be used to rank documents by combining a meta field of choice ("rating" in this case) and the score of the retriever.

How did you test it?

Tested locally and with unit tests

Notes for the reviewer

The implementation is based on the Recentness Ranker. However I made some changes such as renaming the "score" ranking method to "linear_score" to make it more specific. Moreover I separated the logic that reranks the results in another function, in order to make it possible to inherit from this class for the implementation of a Recentness Ranker in Haystack 2.0

Checklist

@domenicocinque domenicocinque requested review from a team as code owners October 29, 2023 14:46
@domenicocinque domenicocinque requested review from dfokina and vblagoje and removed request for a team October 29, 2023 14:46
@github-actions github-actions bot added topic:tests proposal 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Oct 29, 2023
@github-actions github-actions bot removed the proposal label Oct 29, 2023
@vblagoje
Copy link
Member

Hey @domenicocinque , thanks for opening this PR. Would you please provide a bit more context about using this component? See, for example How it can be used section of #6199 It'll help greatly not only me but also @dfokina who will help us with the docs.

For the release note CI failure, you need to add a release note with reno tool. See https://github.com/deepset-ai/haystack/blob/main/CONTRIBUTING.md#release-notes for more details.

@domenicocinque
Copy link
Contributor Author

domenicocinque commented Nov 2, 2023

Hi @vblagoje, thanks for the fast response. I added the How it can be used section in the PR description and the release notes in the code. Please let me know if it needs to be improved

@vblagoje
Copy link
Member

vblagoje commented Nov 3, 2023

@domenicocinque Your explanation helped me a lot to understand the use case of this component. However, what I'm not 100% sure about is whether this component should be integrated into our core packages or included as a valuable community integration. Let me consult internally and we'll get back to you soon.

Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@domenicocinque looks quite solid already. I left one comment that I think should improve code readability and performance as well

haystack/preview/components/rankers/meta_field.py Outdated Show resolved Hide resolved
@vblagoje
Copy link
Member

vblagoje commented Nov 6, 2023

@domenicocinque great work, thank you. One last request - let's add a unit test for the non-happy path in linear_score mode when the score is invalid. Tests those logs in the caplog are captured. Also, in init, let's raise ValueError instead of ComponentError. ComponentErrors are mostly reserved for the run method to signal the invalid component state preventing execution.

@vblagoje
Copy link
Member

vblagoje commented Nov 6, 2023

@dfokina I don't expect any more changes for this PR after @domenicocinque's next commit. Please have a look at it after he commits his last change and make any pydoc corrections 🙏

@vblagoje
Copy link
Member

vblagoje commented Nov 9, 2023

Thanks for the update @domenicocinque and for this overall valuable contribution! @dfokina have a pass now, make any needed changes, and we are ready to 🚢

@vblagoje vblagoje self-requested a review November 9, 2023 09:00
@dfokina
Copy link
Contributor

dfokina commented Nov 9, 2023

All done from my side too! 🚀

@dfokina dfokina merged commit 676da68 into deepset-ai:main Nov 9, 2023
21 checks passed
@domenicocinque domenicocinque deleted the code/metafieldranker branch November 9, 2023 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ranker based on custom meta field
3 participants