Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

text_similarity_reranker returns negative scores for some models #120201

Open
kderusso opened this issue Jan 15, 2025 · 5 comments
Open

text_similarity_reranker returns negative scores for some models #120201

kderusso opened this issue Jan 15, 2025 · 5 comments
Labels
>bug :ml Machine learning :Search Relevance/Ranking Scoring, rescoring, rank evaluation. :SearchOrg/Relevance Label for the Search (solution/org) Relevance team

Comments

@kderusso
Copy link
Member

Elasticsearch Version

8.16.1

Installed Plugins

No response

Java Version

bundled

OS Version

Reproducable on cloud

Problem Description

Certain supported rerank models, including cross-encoder__ms-marco-minilm-l-6-v2, return negative scores when used in conjunction with the text_similarity_reranker. Negative scores are not allowed in the query phase, so we need to handle this better.

One potential solution is linearly shifting or otherwise normalizing the returned score values, so they're always within certain parameters.

Steps to Reproduce

PUT _inference/rerank/ms-marco-minilm-l-6-v2
{
  "service": "elasticsearch",
  "task_type": "rerank",
  "service_settings": {
    "num_allocations": 1,
    "num_threads": 1,
    "model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
  },
  "task_settings": {
    "return_documents": true
  }
}

PUT /my-index
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      }
    }
  }
}

POST /my-index/_doc/
{
  "text": "Dog training classes"
}

POST my-index/_search
{
  "retriever": {
    "text_similarity_reranker": {
      "retriever": {
        "standard": {
          "query": {
            "match": {
              "text": "dog"
            }
          }
        }
      },
      "field": "text",
      "inference_id": "ms-marco-minilm-l-6-v2",
      "inference_text": "dog",
      "rank_window_size": 100
    }
  }
}

The returned search result is:

{
  "took": 26,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": -0.5682936,
    "hits": [
      {
        "_index": "my-index",
        "_id": "s1xJapQBZfij0Ahq54L5",
        "_score": -0.5682936,
        "_source": {
          "text": "Dog training classes"
        }
      }
    ]
  }
}

The score of the document is < 0.

Logs (if relevant)

No response

@kderusso kderusso added :ml Machine learning :Search Relevance/Ranking Scoring, rescoring, rank evaluation. :SearchOrg/Relevance Label for the Search (solution/org) Relevance team >bug labels Jan 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-eng (Team:SearchOrg)

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/search-relevance (Team:Search - Relevance)

@leemthompo
Copy link
Contributor

leemthompo commented Jan 15, 2025

This docs issue looks related: Clarify negative scores returned by Elastic Rerank

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :ml Machine learning :Search Relevance/Ranking Scoring, rescoring, rank evaluation. :SearchOrg/Relevance Label for the Search (solution/org) Relevance team
Projects
None yet
Development

No branches or pull requests

3 participants