-
Notifications
You must be signed in to change notification settings - Fork 127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement UpTrainEvaluator
#272
Conversation
984144c
to
4f95b33
Compare
4f95b33
to
68e7338
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is in really great shape! I have commented what we also briefly talked about.
Could you please also add an example of how the new component can be used in a pipeline? Here is an example of another integration's example: https://github.com/deepset-ai/haystack-core-integrations/blob/main/integrations/chroma/example/example.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add an entry about UpTrain also to the inventory in the readme as part of this PR. https://github.com/deepset-ai/haystack-core-integrations/tree/main?tab=readme-ov-file#inventory
Update project structure to use the `haystack_integrations` namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍 The example is really helpful. The topic of list processing we can postpone and for the output format, let's see whether there are any use cases that would benefit from having separate edges instead of one dict. We talked about the pipeline visualization that unfortunately hides the contents of the dict from the user in its current implementation. Let's get this merged fast and collect feedback from users! Thanks for the fruitful discussions and great job! 🙂
And let's see whether somebody from UpTrain can help with the integration test of the response matching metric that fails with a 500 Internal Server Error |
Related to #248.
We introduce
UpTrainEvaluator
, a component that uses the UpTrain LLM evaluation framework to calculate evaluation metrics for RAG pipelines (among others). Refer deepset-ai/haystack#6784 for an overview of the API design.This PR introduces the following user-facing classes:
UpTrainMetric
- A enumeration that lists the supported UpTrain metrics.UpTrainEvaluator
- The pipeline component interfaces with the evaluation framework. It accepts a single metric and its optional parameters. It also provides extra optional parameters to configure the API client. The inputs to the pipeline are dynamically configured depending on the metric. This is done with help of a metric descriptor table that contains metadata concerning input/output conversion formats, expected inputs/outputs, etc.The output of the component is a nested list of metric results. Each input can have one or more results, depending on the metric. Each result is a dictionary containing the following keys and values:
name
- The name of the metric.score
- The score of the metric.explanation
- An optional explanation of the score.