fix: failsafe for non-valid json and failed LLM calls #7723

davidsbatista · 2024-05-22T07:57:37Z

Related Issues

fixes #7712

Proposed Changes:

The LLM-based evaluators can fail due to:

an error when making a call the LLM
an output returned by the LLM which is an invalid JSON

This PR adds safeguards to LLM-based evaluators:

If an LLM-based evaluator (e.g., Faithfulness or ContextRelevance) is initialised with raise_on_failure=False, and if a call to an LLM fails or an LLM outputs an invalid JSON, it returns np.nan and continues the evaluation.
The user being notified with a warning, and with the number of requests that failed.

coveralls · 2024-05-22T08:52:35Z

Pull Request Test Coverage Report for Build 9210542884

Details

0 of 0 changed or added relevant lines in 0 files are covered.
5 unchanged lines in 1 file lost coverage.
Overall coverage decreased (-0.03%) to 90.563%

Files with Coverage Reduction	New Missed Lines	%
components/evaluators/llm_evaluator.py	5	95.41%

Totals
Change from base Build 9209846779:	-0.03%
Covered Lines:	6641
Relevant Lines:	7333

💛 - Coveralls

haystack/components/evaluators/context_relevance.py

haystack/components/evaluators/faithfulness.py

haystack/components/evaluators/llm_evaluator.py

releasenotes/notes/add-failsafe-for-LLM-based-evaluators-34cdc183ab545315.yaml

haystack/components/evaluators/llm_evaluator.py

haystack/components/evaluators/faithfulness.py

shadeMe

LGTM. Good to merge after @julian-risch's review.

haystack/components/evaluators/context_relevance.py

haystack/components/evaluators/faithfulness.py

Co-authored-by: Madeesh Kannan <[email protected]>

julian-risch

Looks quite good to me already. Two small notes and then I'll approve. Please update the docs pages once we merge this PR.
https://docs.haystack.deepset.ai/docs/llmevaluator
https://docs.haystack.deepset.ai/docs/faithfulnessevaluator
https://docs.haystack.deepset.ai/docs/contextrelevanceevaluator

haystack/components/evaluators/llm_evaluator.py

test/components/evaluators/test_context_relevance_evaluator.py

test/components/evaluators/test_faithfulness_evaluator.py

davidsbatista added 8 commits May 21, 2024 16:29

wip

a21b0c2

initial import

91ad2ef

adding tests

8746035

adding params

3d16830

adding safeguards for nan in evaluators

33dd22d

adding docstrings

7473d1f

fixing tests

b2ff89a

Merge branch 'main' into failsafe-for-non-valid-JSON

75af5ff

github-actions bot added topic:tests type:documentation Improvements on the docs labels May 22, 2024

davidsbatista added 3 commits May 22, 2024 10:04

removing unused imports

860c2aa

removing unused imports

d502ed9

removing unused imports

2538ed3

davidsbatista added 2 commits May 22, 2024 11:47

adding tests to context and faithfullness evaluators

f5f3818

fixing docstrings

a271db7

davidsbatista marked this pull request as ready for review May 22, 2024 10:25

davidsbatista requested a review from a team as a code owner May 22, 2024 10:25

davidsbatista requested review from julian-risch and shadeMe and removed request for a team May 22, 2024 10:25

davidsbatista changed the title ~~Failsafe for non valid json~~ fix: failsafe for non-valid json and failed LLM calls May 22, 2024

davidsbatista added 3 commits May 22, 2024 12:30

nit

54a0146

removing unused imports

12164d8

adding release notes

687312f

davidsbatista requested a review from a team as a code owner May 22, 2024 10:44

davidsbatista requested review from dfokina and removed request for a team May 22, 2024 10:44

shadeMe suggested changes May 22, 2024

View reviewed changes

haystack/components/evaluators/llm_evaluator.py Outdated Show resolved Hide resolved

shadeMe suggested changes May 22, 2024

View reviewed changes

haystack/components/evaluators/llm_evaluator.py Outdated Show resolved Hide resolved

haystack/components/evaluators/faithfulness.py Outdated Show resolved Hide resolved

davidsbatista added 6 commits May 22, 2024 16:21

attending PR comments

2b94818

fixing tests

a2c69dd

fixing tests

e9497ec

Merge branch 'main' into failsafe-for-non-valid-JSON

f98930d

adding types

c0570ec

removing unused imports

796588c

shadeMe approved these changes May 22, 2024

View reviewed changes

haystack/components/evaluators/context_relevance.py Outdated Show resolved Hide resolved

haystack/components/evaluators/faithfulness.py Outdated Show resolved Hide resolved

davidsbatista and others added 3 commits May 23, 2024 09:21

Update haystack/components/evaluators/context_relevance.py

50f6477

Co-authored-by: Madeesh Kannan <[email protected]>

Update haystack/components/evaluators/faithfulness.py

8ce0c9d

Co-authored-by: Madeesh Kannan <[email protected]>

Merge branch 'main' into failsafe-for-non-valid-JSON

a7d7879

julian-risch requested changes May 23, 2024

View reviewed changes

haystack/components/evaluators/llm_evaluator.py Show resolved Hide resolved

test/components/evaluators/test_context_relevance_evaluator.py Outdated Show resolved Hide resolved

test/components/evaluators/test_faithfulness_evaluator.py Outdated Show resolved Hide resolved

davidsbatista added 2 commits May 23, 2024 17:13

attending PR comments

391e4fa

Merge branch 'main' into failsafe-for-non-valid-JSON

a49fc65

julian-risch approved these changes May 23, 2024

View reviewed changes

davidsbatista enabled auto-merge (squash) May 23, 2024 15:31

davidsbatista merged commit 38747ff into main May 23, 2024
25 checks passed

davidsbatista deleted the failsafe-for-non-valid-JSON branch May 23, 2024 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: failsafe for non-valid json and failed LLM calls #7723

fix: failsafe for non-valid json and failed LLM calls #7723

davidsbatista commented May 22, 2024 •

edited

Loading

coveralls commented May 22, 2024 •

edited

Loading

shadeMe left a comment

julian-risch left a comment

fix: failsafe for non-valid json and failed LLM calls #7723

fix: failsafe for non-valid json and failed LLM calls #7723

Conversation

davidsbatista commented May 22, 2024 • edited Loading

Related Issues

Proposed Changes:

coveralls commented May 22, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9210542884

Details

💛 - Coveralls

shadeMe left a comment

Choose a reason for hiding this comment

julian-risch left a comment

Choose a reason for hiding this comment

davidsbatista commented May 22, 2024 •

edited

Loading

coveralls commented May 22, 2024 •

edited

Loading