New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

feat: Add `SASEvaluator` component #6980

Merged

silvanocerza merged 5 commits into main from sas-evaluator

Feb 14, 2024

Contributor

silvanocerza commented Feb 13, 2024

Related Issues

Part of #6903

Proposed Changes:

Add a new SASEvaluator component. It can be used to evaluate Semantic Answer Similarity metric of answers returns by LLMs.

How did you test it?

I migrated existing tests from test_eval_sas.py to use the new API.

Notes for the reviewer

I didn't delete the old eval API for the time being. A later PR will purge it after we move everything to the new one.

I'll add documentation configs in a later PR too.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

silvanocerza added 3 commits

February 13, 2024 10:55


          Add SASEvaluator component

3b2de67


          Add release notes

6383f0c


          Delete old tests

77da8fb

silvanocerza self-assigned this

silvanocerza requested review from a team as code owners

February 13, 2024 10:21

silvanocerza requested review from dfokina and ZanSara and removed request for a team

February 13, 2024 10:21

github-actions bot added topic:tests 2.x type:documentation labels

Collaborator

coveralls commented Feb 13, 2024 •

edited

Loading

Pull Request Test Coverage Report for Build 7886080057

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

0 of 0 changed or added relevant lines in 0 files are covered.
15 unchanged lines in 2 files lost coverage.
Overall coverage decreased (-0.005%) to 88.701%

Files with Coverage Reduction	New Missed Lines	%
evaluation/metrics.py	1	85.71%
evaluation/eval.py	14	80.56%

Totals
Change from base Build 7875553819:	-0.005%
Covered Lines:	4828
Relevant Lines:	5443

💛 - Coveralls

silvanocerza added 2 commits

February 13, 2024 12:11


          Remove SAS metric in old API


          Avoid importing whole numpy package

418f648

silvanocerza mentioned this pull request

feat: Add StatisticalEvaluator component #6982

Merged

ZanSara approved these changes

View reviewed changes

Contributor

ZanSara left a comment

Looks good!

silvanocerza merged commit 9297fca into main

22 checks passed

silvanocerza deleted the sas-evaluator branch

February 14, 2024 15:16

shadeMe reviewed

View reviewed changes

Contributor

shadeMe left a comment

Better late than never!

haystack/components/eval/__init__.py

		@@ -0,0 +1,3 @@
		from .sas_evaluator import SASEvaluator

Contributor

shadeMe Feb 14, 2024 •

edited

Loading

This, and the other core evaluators, should go into haystack.components.evaluators instead. We already use that convention for the integrations.

haystack/components/eval/preprocess.py

Comment on lines +14 to +21

+                  Preprocess the outputs of the runnable to remove unwanted characters.
+                  :param regexes_to_ignore (list, optional): A list of regular expressions. If provided, it removes substrings
+                      matching these regular expressions from the text. Defaults to None.
+                  :param ignore_case (bool, optional): If True, converts all characters to lowercase. Defaults to False.
+                  :param ignore_punctuation (bool, optional): If True, removes punctuation from the text. Defaults to False.
+                  :param ignore_numbers (bool, optional): If True, removes numerical digits from the text. Defaults to False.
+                  :return: A list of preprocessed strings.

Contributor

shadeMe Feb 14, 2024

Needs to be updated, type annotations removed, etc.

haystack/components/eval/preprocess.py

		@@ -0,0 +1,38 @@
		import re

Contributor

shadeMe Feb 14, 2024

This module looks like it could go into haystack.utils? Seems generic enough.

haystack/components/eval/sas_evaluator.py

+                  The SAS is computed using a pre-trained model from the Hugging Face model hub. The model can be either a
+                  Bi-Encoder or a Cross-Encoder. The choice of the model is based on the `model` parameter.
+                  The default model is `sentence-transformers/paraphrase-multilingual-mpnet-base-v2`.

Contributor

shadeMe Feb 14, 2024

Let's remove the line about the default model from the docstring. No reason to have two places where it's documented (the function signature/default value itself being the other place).

haystack/components/eval/sas_evaluator.py

Comment on lines +32 to +35

+                      regexes_to_ignore: Optional[List[str]] = None,
+                      ignore_case: bool = False,
+                      ignore_punctuation: bool = False,
+                      ignore_numbers: bool = False,

Contributor

shadeMe Feb 14, 2024

I feel it would be a better idea to move this preprocessing into its own component? The transformations are not specific to this component as the statistical evaluator also uses it. Separating it will also let us add other transformations without breaking this component.

haystack/components/eval/sas_evaluator.py


		from .preprocess import _preprocess_text

		with LazyImport(message="Run 'pip install scikit-learn \"sentence-transformers>=2.2.0\"'") as metrics_import:

Contributor

shadeMe Feb 14, 2024

Why metrics_import? sas_import sounds closer to what it's importing, no?

haystack/components/eval/sas_evaluator.py Show resolved Hide resolved

haystack/components/eval/sas_evaluator.py

+                  def __init__(
+                      self,
+                      labels: List[str],

Contributor

shadeMe Feb 14, 2024

Why are we storing the labels in the component itself? This should be an input, i.e., similar to how the LLM evaluators take parallel lists.

haystack/components/eval/sas_evaluator.py

+                      return default_from_dict(cls, data)
+                  @component.output_types(sas=float, scores=List[float])
+                  def run(self, predictions: List[str]) -> Dict[str, Any]:

Contributor

shadeMe Feb 14, 2024

Docstring for this method (and the rest). For run, let's also document what the expected input and outputs are.

haystack/components/eval/sas_evaluator.py

+                      labels = _preprocess_text(
+                          self._labels, self._regexes_to_ignore, self._ignore_case, self._ignore_punctuation, self._ignore_numbers
+                      )
+                      config = AutoConfig.from_pretrained(self._model, use_auth_token=token)

Contributor

shadeMe Feb 14, 2024

Let's initialize the models in warm_up and save/cache them in the component; looks like we are constructing (meaning loading from disk) the model for every run call at the moment?

This was referenced Feb 15, 2024

feat: Add TextCleaner component #6997

Merged

refactor: Refactor SASEvaluator #6998

Merged

refactor: Refactor StatisticalEvaluator #6999

Merged

silvanocerza mentioned this pull request

feat: Add SASEvaluator #7428

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2.x topic:tests type:documentation