Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add SASEvaluator component #6980

Merged
merged 5 commits into from
Feb 14, 2024
Merged

feat: Add SASEvaluator component #6980

merged 5 commits into from
Feb 14, 2024

Conversation

silvanocerza
Copy link
Contributor

Related Issues

Part of #6903

Proposed Changes:

Add a new SASEvaluator component. It can be used to evaluate Semantic Answer Similarity metric of answers returns by LLMs.

How did you test it?

I migrated existing tests from test_eval_sas.py to use the new API.

Notes for the reviewer

I didn't delete the old eval API for the time being. A later PR will purge it after we move everything to the new one.

I'll add documentation configs in a later PR too.

Checklist

@silvanocerza silvanocerza self-assigned this Feb 13, 2024
@silvanocerza silvanocerza requested review from a team as code owners February 13, 2024 10:21
@silvanocerza silvanocerza requested review from dfokina and ZanSara and removed request for a team February 13, 2024 10:21
@github-actions github-actions bot added topic:tests 2.x Related to Haystack v2.0 type:documentation Improvements on the docs labels Feb 13, 2024
@coveralls
Copy link
Collaborator

coveralls commented Feb 13, 2024

Pull Request Test Coverage Report for Build 7886080057

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 15 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.005%) to 88.701%

Files with Coverage Reduction New Missed Lines %
evaluation/metrics.py 1 85.71%
evaluation/eval.py 14 80.56%
Totals Coverage Status
Change from base Build 7875553819: -0.005%
Covered Lines: 4828
Relevant Lines: 5443

💛 - Coveralls

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@silvanocerza silvanocerza merged commit 9297fca into main Feb 14, 2024
22 checks passed
@silvanocerza silvanocerza deleted the sas-evaluator branch February 14, 2024 15:16
Copy link
Contributor

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better late than never!

@@ -0,0 +1,3 @@
from .sas_evaluator import SASEvaluator
Copy link
Contributor

@shadeMe shadeMe Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the other core evaluators, should go into haystack.components.evaluators instead. We already use that convention for the integrations.

Comment on lines +14 to +21
Preprocess the outputs of the runnable to remove unwanted characters.

:param regexes_to_ignore (list, optional): A list of regular expressions. If provided, it removes substrings
matching these regular expressions from the text. Defaults to None.
:param ignore_case (bool, optional): If True, converts all characters to lowercase. Defaults to False.
:param ignore_punctuation (bool, optional): If True, removes punctuation from the text. Defaults to False.
:param ignore_numbers (bool, optional): If True, removes numerical digits from the text. Defaults to False.
:return: A list of preprocessed strings.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be updated, type annotations removed, etc.

@@ -0,0 +1,38 @@
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module looks like it could go into haystack.utils? Seems generic enough.


The SAS is computed using a pre-trained model from the Hugging Face model hub. The model can be either a
Bi-Encoder or a Cross-Encoder. The choice of the model is based on the `model` parameter.
The default model is `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the line about the default model from the docstring. No reason to have two places where it's documented (the function signature/default value itself being the other place).

Comment on lines +32 to +35
regexes_to_ignore: Optional[List[str]] = None,
ignore_case: bool = False,
ignore_punctuation: bool = False,
ignore_numbers: bool = False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel it would be a better idea to move this preprocessing into its own component? The transformations are not specific to this component as the statistical evaluator also uses it. Separating it will also let us add other transformations without breaking this component.


from .preprocess import _preprocess_text

with LazyImport(message="Run 'pip install scikit-learn \"sentence-transformers>=2.2.0\"'") as metrics_import:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why metrics_import? sas_import sounds closer to what it's importing, no?

haystack/components/eval/sas_evaluator.py Show resolved Hide resolved

def __init__(
self,
labels: List[str],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we storing the labels in the component itself? This should be an input, i.e., similar to how the LLM evaluators take parallel lists.

return default_from_dict(cls, data)

@component.output_types(sas=float, scores=List[float])
def run(self, predictions: List[str]) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring for this method (and the rest). For run, let's also document what the expected input and outputs are.

labels = _preprocess_text(
self._labels, self._regexes_to_ignore, self._ignore_case, self._ignore_punctuation, self._ignore_numbers
)
config = AutoConfig.from_pretrained(self._model, use_auth_token=token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's initialize the models in warm_up and save/cache them in the component; looks like we are constructing (meaning loading from disk) the model for every run call at the moment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants