feat: Add `StatisticalEvaluator` component #6982

silvanocerza · 2024-02-13T12:05:36Z

Related Issues

Part of #6903

Proposed Changes:

Add a new StatisticalEvaluator component. It can be used to evaluate different statistical metrics from answers returned by LLMs.

As of now it only supports F1 and Exact Match metric as I just migrated it from the previous API.
Ideally in future PRs we should add also Recall, Mean Reciprocal Rank and Mean Average Precision

How did you test it?

I migrated existing tests from test_eval_f1.py and test_eval_em.py to use the new API.

Notes for the reviewer

I didn't delete the old eval API for the time being. A later PR will purge it after we move everything to the new one.

I'll add documentation configs in a later PR too.

Depends on #6980

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

vblagoje · 2024-02-13T12:32:10Z

@silvanocerza, can someone else with a better background look at this one? I can as well, but it'll take some time...

silvanocerza · 2024-02-13T15:03:46Z

@vblagoje would be cool if you could take a look in any case, even if you don't have enough context. If it's not thorough that's ok, you will at least familiarize with this part.

Then we can have another set of eyes that have more context if necessary. 👍

vblagoje · 2024-02-14T08:55:10Z

haystack/components/eval/statistical_evaluator.py

+        return default_from_dict(cls, data)
+
+    @component.output_types(result=float)
+    def run(self, predictions: List[str]) -> Dict[str, Any]:


Yeah I also noticed the comment about length of these predictions, why not add a check here for zero length and then we can omit checks in all metrics?

As of now I moved only the metrics that we already had from the old API. In future PRs other will be added, I'm unsure whether those will return the same values as F1 and Exact Match if length is zero. That's the only reason I've done it like this.

vblagoje

Seems good to go, left some minor comments

coveralls · 2024-02-14T15:27:44Z

Pull Request Test Coverage Report for Build 7903381466

Details

0 of 0 changed or added relevant lines in 0 files are covered.
10 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.05%) to 88.951%

Files with Coverage Reduction	New Missed Lines	%
evaluation/eval.py	10	65.63%

Totals
Change from base Build 7903358782:	0.05%
Covered Lines:	4943
Relevant Lines:	5557

💛 - Coveralls

shadeMe · 2024-02-14T17:48:14Z

haystack/components/eval/__init__.py

@@ -1,3 +1,4 @@
 from .sas_evaluator import SASEvaluator


This should go into haystack.components.evaluators

shadeMe · 2024-02-14T17:50:14Z

haystack/components/eval/statistical_evaluator.py

+    - Exact Match: Measures the proportion of cases where prediction is identical to the expected label.
+    """
+
+    class Metric(Enum):


Let's extract this and move it out the top-level namespace (components.evaluators). We should call it StatisticalMetrics (to disambiguate it from the others).

shadeMe · 2024-02-14T17:51:20Z

haystack/components/eval/statistical_evaluator.py

+        F1 = "F1"
+        EM = "Exact Match"


We should probably stick to snake_case like we do elsewhere.

shadeMe · 2024-02-14T17:51:37Z

haystack/components/eval/statistical_evaluator.py

+
+    def __init__(
+        self,
+        labels: List[str],


This should be an input instead. (cf https://github.com/deepset-ai/haystack/pull/6980/files#r1489849342)

shadeMe · 2024-02-14T17:52:21Z

haystack/components/eval/statistical_evaluator.py

+    def __init__(
+        self,
+        labels: List[str],
+        metric: Metric,


Let's make this a Union[str, StatisticalMetric] and add a from_str function to the latter.

shadeMe · 2024-02-14T17:53:03Z

haystack/components/eval/statistical_evaluator.py

+        regexes_to_ignore: Optional[List[str]] = None,
+        ignore_case: bool = False,
+        ignore_punctuation: bool = False,
+        ignore_numbers: bool = False,


c.f https://github.com/deepset-ai/haystack/pull/6980/files#r1489838238

shadeMe · 2024-02-14T17:53:55Z

haystack/components/eval/statistical_evaluator.py

+        return default_from_dict(cls, data)
+
+    @component.output_types(result=float)
+    def run(self, predictions: List[str]) -> Dict[str, Any]:


c.f https://github.com/deepset-ai/haystack/pull/6980/files#r1489850005

shadeMe · 2024-02-14T17:54:41Z

haystack/components/eval/statistical_evaluator.py

+
+        return {"result": self._metric_function(labels, predictions)}
+
+    def _f1(self, labels: List[str], predictions: List[str]):


Can be a @staticmethod.

shadeMe · 2024-02-14T17:54:47Z

haystack/components/eval/statistical_evaluator.py

+
+        return np_mean(scores)
+
+    def _exact_match(self, labels: List[str], predictions: List[str]) -> float:


Can be a @staticmethod.

silvanocerza self-assigned this Feb 13, 2024

silvanocerza requested review from a team as code owners February 13, 2024 12:05

silvanocerza requested review from dfokina and vblagoje and removed request for a team February 13, 2024 12:05

github-actions bot added topic:tests type:documentation Improvements on the docs labels Feb 13, 2024

silvanocerza mentioned this pull request Feb 13, 2024

chore: Delete old eval API #6983

Merged

vblagoje reviewed Feb 14, 2024

View reviewed changes

vblagoje approved these changes Feb 14, 2024

View reviewed changes

Base automatically changed from sas-evaluator to main February 14, 2024 15:16

silvanocerza added 4 commits February 14, 2024 16:17

Add StatisticalEvaluator component

f974840

Remove F1 and Exact Metric from old API

2d0cb4e

Add release notes

6115077

Update docstrings

5ad3db2

silvanocerza force-pushed the statistical-evaluator branch from 36cd3ed to 5ad3db2 Compare February 14, 2024 15:18

github-actions bot added the 2.x Related to Haystack v2.0 label Feb 14, 2024

silvanocerza merged commit 36ab23d into main Feb 14, 2024
22 checks passed

silvanocerza deleted the statistical-evaluator branch February 14, 2024 15:48

shadeMe reviewed Feb 14, 2024

View reviewed changes

shadeMe mentioned this pull request Feb 15, 2024

refactor: Refactor StatisticalEvaluator #6999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `StatisticalEvaluator` component #6982

feat: Add `StatisticalEvaluator` component #6982

silvanocerza commented Feb 13, 2024

vblagoje commented Feb 13, 2024

silvanocerza commented Feb 13, 2024

vblagoje Feb 14, 2024

silvanocerza Feb 14, 2024

vblagoje left a comment

coveralls commented Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024

shadeMe Feb 14, 2024


		return {"result": self._metric_function(labels, predictions)}

		def _f1(self, labels: List[str], predictions: List[str]):


		return np_mean(scores)

		def _exact_match(self, labels: List[str], predictions: List[str]) -> float:

feat: Add StatisticalEvaluator component #6982

feat: Add StatisticalEvaluator component #6982

Conversation

silvanocerza commented Feb 13, 2024

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

vblagoje commented Feb 13, 2024

silvanocerza commented Feb 13, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vblagoje left a comment

Choose a reason for hiding this comment

coveralls commented Feb 14, 2024

Pull Request Test Coverage Report for Build 7903381466

Details

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

feat: Add `StatisticalEvaluator` component #6982

feat: Add `StatisticalEvaluator` component #6982