-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add SASEvaluator
component
#6980
Conversation
Pull Request Test Coverage Report for Build 7886080057Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better late than never!
@@ -0,0 +1,3 @@ | |||
from .sas_evaluator import SASEvaluator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This, and the other core evaluators, should go into haystack.components.evaluators
instead. We already use that convention for the integrations.
Preprocess the outputs of the runnable to remove unwanted characters. | ||
|
||
:param regexes_to_ignore (list, optional): A list of regular expressions. If provided, it removes substrings | ||
matching these regular expressions from the text. Defaults to None. | ||
:param ignore_case (bool, optional): If True, converts all characters to lowercase. Defaults to False. | ||
:param ignore_punctuation (bool, optional): If True, removes punctuation from the text. Defaults to False. | ||
:param ignore_numbers (bool, optional): If True, removes numerical digits from the text. Defaults to False. | ||
:return: A list of preprocessed strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be updated, type annotations removed, etc.
@@ -0,0 +1,38 @@ | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This module looks like it could go into haystack.utils
? Seems generic enough.
|
||
The SAS is computed using a pre-trained model from the Hugging Face model hub. The model can be either a | ||
Bi-Encoder or a Cross-Encoder. The choice of the model is based on the `model` parameter. | ||
The default model is `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the line about the default model from the docstring. No reason to have two places where it's documented (the function signature/default value itself being the other place).
regexes_to_ignore: Optional[List[str]] = None, | ||
ignore_case: bool = False, | ||
ignore_punctuation: bool = False, | ||
ignore_numbers: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel it would be a better idea to move this preprocessing into its own component? The transformations are not specific to this component as the statistical evaluator also uses it. Separating it will also let us add other transformations without breaking this component.
|
||
from .preprocess import _preprocess_text | ||
|
||
with LazyImport(message="Run 'pip install scikit-learn \"sentence-transformers>=2.2.0\"'") as metrics_import: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why metrics_import
? sas_import
sounds closer to what it's importing, no?
|
||
def __init__( | ||
self, | ||
labels: List[str], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we storing the labels in the component itself? This should be an input, i.e., similar to how the LLM evaluators take parallel lists.
return default_from_dict(cls, data) | ||
|
||
@component.output_types(sas=float, scores=List[float]) | ||
def run(self, predictions: List[str]) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring for this method (and the rest). For run
, let's also document what the expected input and outputs are.
labels = _preprocess_text( | ||
self._labels, self._regexes_to_ignore, self._ignore_case, self._ignore_punctuation, self._ignore_numbers | ||
) | ||
config = AutoConfig.from_pretrained(self._model, use_auth_token=token) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's initialize the models in warm_up
and save/cache them in the component; looks like we are constructing (meaning loading from disk) the model for every run call at the moment?
Related Issues
Part of #6903
Proposed Changes:
Add a new
SASEvaluator
component. It can be used to evaluate Semantic Answer Similarity metric of answers returns by LLMs.How did you test it?
I migrated existing tests from
test_eval_sas.py
to use the new API.Notes for the reviewer
I didn't delete the old eval API for the time being. A later PR will purge it after we move everything to the new one.
I'll add documentation configs in a later PR too.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.