-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SQL Metrics Implementation #59
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❌ Changes requested. Reviewed everything up to da8a897 in 3 minutes and 30 seconds
More details
- Looked at
43
lines of code in1
files - Skipped
1
files when reviewing. - Skipped posting
0
drafted comments based on config settings.
Workflow ID: wflow_SbrHuyQL59T55GNV
Want Ellipsis to fix these issues? Tag @ellipsis-dev
in a comment. You can customize Ellipsis with 👍 / 👎 feedback, review rules, user-specific overrides, quiet
mode, and more.
# Compare the formatted answer with each formatted ground truth answer | ||
for formatted_gt in formatted_ground_truths: | ||
# Simple string comparison for now, can be improved with more sophisticated methods | ||
match_score = float(formatted_answer == formatted_gt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider implementing a more sophisticated comparison method than simple string equality to handle cases where SQL queries might be functionally identical but differ in formatting or syntax. This could improve the robustness of the syntactic similarity evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ellipsis-dev come up with a few more sophisticated ways to handle functionally identical cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a cross repository pull request, but Ellipsis isn't installed in yisz/continuous-eval-locale. In order to have Ellipsis address comments in this PR, you'll need to install Ellipsis in that repository. |
Add sql metrics
@pantonante check to see if Devin's work is good enough. It added tests / documentation as well. |
It uses the sqlparse library to format and compare the SQL queries. | ||
""" | ||
|
||
def __call__(self, answer: str, ground_truth_answers: Union[List[str], str]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be better I think: https://github.com/tobymao/sqlglot?tab=readme-ov-file#ast-diff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is too strict, I think we can do better
Pull Request Description
Summary
This pull request introduces a new SQL AST comparison metric to the
continuous-eval
repository. The new metric,SQLASTSimilarity
, compares SQL queries using Abstract Syntax Tree (AST) similarity, leveraging thesqlglot
library.Changes
SQLASTSimilarity
class to thecode_deterministic_metrics.py
file.diff
andparse_one
functions from thesqlglot
library.Keep
class from thesqlglot.diff
module.__call__
method in theSQLASTSimilarity
class to parse SQL queries into ASTs and calculate similarity scores._calculate_similarity
method in theSQLASTSimilarity
class to calculate the similarity score between two ASTs by using thediff
function to get the differences between the trees, counting the total changes, and calculating the total number of nodes in both trees. The similarity score is calculated as1 - (total_changes / total_nodes)
.Testing
test_code_deterministic_metrics.py
, with unit tests for theSQLASTSimilarity
class.SQLASTSimilarity
class, including tests for exact match, different queries, similar queries, and invalid queries.pytest
, and all tests passed successfully.Link to Devin run
https://preview.devin.ai/devin/696032ba45654233968d6a04f2bc5df3
Request for Review
Please review the changes and provide feedback. If everything looks good, kindly approve the pull request for merging.
Thank you!