-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Langchain #216
base: main
Are you sure you want to change the base?
Conversation
…into langchain Merging evaluation pipeline together
…langchain Updating the LLM setup, as well as new tests and protocols
…langchain Conflicts: pyproject.toml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hi @batwood-1 , i have left some comments. I feel some changes are needed here. You can start addressing my comments and we can meet early Jan and discuss this merge in person. what do you think?
references: List[str] = [line.text for line in script_lines if line.speaker == "agent" and line.text is not None] | ||
hypotheses: List[str] = [line.text for line in script_lines if line.speaker == "user" and line.text is not None] | ||
|
||
if not references or not hypotheses: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you want to raise an Error here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated evaluation.py
|
||
Args: | ||
script_lines (List[ScriptLine]): A list of script lines to evaluate. | ||
metrics (List[str]): A list of metrics to use for evaluation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is metrics
ever used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deep eval is legacy and should have been deleted but it must have copied over when I updated my branch
if not references or not hypotheses: | ||
return {"metrics": []} | ||
|
||
metric_instance = Rouge() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is rouge the only metric implemented so far?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this was just a test implementation of evaluations.py
@batwood-1 thank you for addressing some of my comments. We will check this deeply tomorrow in person |
Description
Related Issue(s)
Motivation and Context
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: