Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ElasticSearch Retriever is not performing well #598

Closed
Asma-droid opened this issue Mar 15, 2024 · 5 comments
Closed

ElasticSearch Retriever is not performing well #598

Asma-droid opened this issue Mar 15, 2024 · 5 comments

Comments

@Asma-droid
Copy link

Hello,

i'am using ElasticSearch as DocumentStore. So, i am using elastic search retrieval as follows

 embedding_retriever:
    init_parameters:
      document_store:
        embedding_similarity_function: l2_norm
        init_parameters:
          hosts: http://elasticsearch:9200
        type: haystack_integrations.document_stores.elasticsearch.document_store.ElasticsearchDocumentStore
      num_candidates: 10
      top_k: 10
    type: haystack_integrations.components.retrievers.elasticsearch.embedding_retriever.ElasticsearchEmbeddingRetriever
    

Although answer is out of the context, the retriever still return documents with high score. below is an example

{
"AnswerBuilder": {
"answers": [
{
"data": " The context provided does not contain information about Langchain.",
"query": "WHat is langchain ?",
"documents": [
{
"id": "b0b39b5c34c63991019b566e34b1ccfb784cf96a461cebc3711611fd5d9b8b38",
"content": "general-purpose speech toolkit. arXiv preprint\narXiv:2106.04624 .\nRebai, I., Benhamiche, S., Thompson, K., Sellami, Z.,\nLaine, D., and Lorr ´e, J.-P. (2020). Linto platform: A\nsmart open voice assistant for business environments.\nInProceedings of the 1st International Workshop on\nLanguage Technology Platforms , pages 89–95.\nRNNoise (2023). Github RNNoise. https://github.com/\nxiph/rnnoise.\nSpiller, T. R., Ben-Zion, Z., Korem, N., Harpaz-Rotem, I.,\nand Duek, O. (2023). Efficient and accurate transcrip-\ntion in mental health research-a tutorial on using whis-\nper ai for sound file transcription.Suznjevic, M. and Saldana, J. (2016). Delay limits for real-\ntime services. IETF draft .\nTrabelsi, A., Warichet, S., Aajaoun, Y ., and Soussilane, S.\n(2022). Evaluation of the efficiency of state-of-the-\nart speech recognition engines. Procedia Computer\nScience , 207:2242–2252.\nUnion, I. T. (2016). Mean opinion score interpretation and\nreporting. Standard, International Telecommunication\nUnion, Geneva, CH.\nValin, J.-M. (2018). A hybrid dsp/deep learning approach\nto real-time full-band speech enhancement. In 2018\nIEEE 20th international workshop on multimedia sig-\nnal processing (MMSP) , pages 1–5. IEEE.\nVaseghi, S. V . (2008). Advanced digital ",
"dataframe": null,
"blob": null,
"meta": {
"source": "default/ICAART24.pdf",
"page": 7,
"source_id": "74d29100e8daffb446d9d6e1c7185e096e3a51cf9332fc6c421cd9ca467648d6"
},
"score": 0.67131597,

Best regards

@DemirTonchev
Copy link

Elastic search uses bm25 algorithm, why do think score of 0.67 is high?

@Asma-droid
Copy link
Author

@DemirTonchev i am using ES embedding Retriever. For query matchs with retrieved documents i have as well score between 0.60 and 0.82. So for me if the query does not match with retrieved documents, scores should be very small.

@DemirTonchev
Copy link

DemirTonchev commented Mar 16, 2024

So for me if the query does not match with retrieved documents, scores should be very small.

Score of 0.6 - 0.82 is usually (in my experience) negligibly small. What is the length of your corpus and average idf?
Looking at the query "WHat is langchain ?" and seeing the output document I would expect the score is small, there is no "langchain" in the returned text. How many documents are in the corpus that contain at least one occurrence of "langchain"?
Also I suspect that " " (white space) is in your ES Doc store, which is not ideal.

@Asma-droid
Copy link
Author

Asma-droid commented Mar 16, 2024

@DemirTonchev in my documentstore i have just one document that talks about Vosk and Kaldi! There is no Occurance of langchain. I did this on purpose to see how the model behaves

When i ask a question about vosk, I have the good answer with score equals 0.67. Below is a screenshot

image

I remark that the score is between 0 and 1 .

So my conclusion is that when we ask a question out of context the retriever still return results with +- high score.

Can you please explain more the whitespace problem. I cannot got it.

@anakin87
Copy link
Member

anakin87 commented Mar 19, 2024

Should be investigated.

  • which embedding model are you using?
  • have you tried with other embedding_similarity_functions?

@anakin87 anakin87 transferred this issue from deepset-ai/haystack Mar 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants