Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add CohereRanker for 2.0 #7446

Conversation

AnushreeBannadabhavi
Copy link
Contributor

Related Issues

Proposed Changes:

Add CohereRanker for Haystack 2.0.
Performs reranking of documents using Cohere reranking models. Reranks retrieved documents based on semantic relevance to a query.
For more information refer to Cohere reranker

How did you test it?

  • Tests have been added in test_cohere.py
  • Tested the component usage in a pipeline using the following code:
from haystack import Document, Pipeline
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryBM25Retriever
from haystack.components.rankers import CohereRanker

# Note set your API key by running the below command in your terminal
# export CO_API_KEY="<your key>"

docs = [Document(content="Paris is in France"), 
        Document(content="Berlin is in Germany"),
        Document(content="Lyon is in France")]
document_store = InMemoryDocumentStore()
document_store.write_documents(docs)

retriever = InMemoryBM25Retriever(document_store = document_store)
ranker = CohereRanker(model="rerank-english-v2.0", top_k=3)

document_ranker_pipeline = Pipeline()
document_ranker_pipeline.add_component(instance=retriever, name="retriever")
document_ranker_pipeline.add_component(instance=ranker, name="ranker")

document_ranker_pipeline.connect("retriever.documents", "ranker.documents")

query = "Cities in France"
res = document_ranker_pipeline.run(data={"retriever": {"query": query, "top_k": 3}, 
                                   "ranker": {"query": query, "top_k": 2}})
print(res)

Notes for the reviewer

  • When the list of documents to rerank is empty, Cohere raises an ApiError with the message invalid request: list of documents must not be empty.
raise ApiError(status_code=_response.status_code, body=_response_json)
cohere.core.api_error.ApiError: status_code: 400, body: {'message': 'invalid request: list of documents must not be empty'}
  • The current implementation does not handle the empty document list scenario in CohereRanker since the exception raised by cohere is descriptive. However, I can add a check in the CohereRanker itself to handle such a scenario.

Checklist

@AnushreeBannadabhavi AnushreeBannadabhavi requested review from a team as code owners March 31, 2024 20:54
@AnushreeBannadabhavi AnushreeBannadabhavi requested review from dfokina and anakin87 and removed request for a team March 31, 2024 20:54
@github-actions github-actions bot added topic:tests topic:build/distribution 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Mar 31, 2024
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 8501345442

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.05%) to 89.584%

Totals Coverage Status
Change from base Build 8466615213: 0.05%
Covered Lines: 5616
Relevant Lines: 6269

💛 - Coveralls

@AnushreeBannadabhavi
Copy link
Contributor Author

Hi @anakin87 and @dfokina!
I added "cohere==5.1.7" under dependencies in pyproject.toml (7fbe9bf) which causes the CI to fail due to a license issue. On the Cohere repo, it is mentioned that the license is MIT license. I'm not sure why the license compliance fails.

Previously, I included "cohere==5.1.7" under extra-dependencies in [tool.hatch.envs.test] and the license compliance was successful. However, this resulted in a different error where the test_for_missing_dependencies() test in test_imports.py failed.

I think I'm not adding cohere in the right place in pyproject.toml. It'd be great if you can provide some feedback here. Thank you!

@anakin87
Copy link
Member

anakin87 commented Apr 2, 2024

Hey, @AnushreeBannadabhavi!

Thank you for your contribution.

Starting from Haystack 2.0, the integrations maintained by us live here: https://github.com/deepset-ai/haystack-core-integrations

So, I would ask you to close this PR and open a similar one in the haystack-core-integrations repository, following these guidelines.

Feel free to ask for clarification (if needed).

@AnushreeBannadabhavi
Copy link
Contributor Author

Thanks @anakin87! Will open a PR in https://github.com/deepset-ai/haystack-core-integrations shortly

@anakin87
Copy link
Member

anakin87 commented Apr 4, 2024

I will close this one and wait for the other 💙

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:build/distribution topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CohereRanker
3 participants