Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/improve wildcard error 15004 #15056

Closed

Conversation

Jiashu-Hu
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The current error messages for non-aggregate columns in GROUP BY queries are technically accurate but not user-friendly. They use technical terminology like "non-aggregate values" without clearly explaining the problem or providing actionable solutions. This PR improves these error messages to be more intuitive and helpful, especially for users who might not be familiar with SQL aggregation concepts.

What changes are included in this PR?

added two function in to expand_wildcard_rule.rs:

  1. fn check_group_by_info(input: &LogicalPlan) -> (bool, Vec<Expr>, Vec<Expr>)
  • Detects if a query contains a GROUP BY clause
  • Collects columns used in GROUP BY and in aggregate functions
  • Returns a tuple of (has_group_by, group_by_columns, aggregate_columns)
  1. fn validate_columns_in_group_by_or_aggregate(has_group_by: bool, expanded: &Vec<Expr>, group_by_columns: &Vec<Expr>, aggregate_columns: &Vec<Expr>) -> Result<(), DataFusionError>
  • Verifies all selected columns appear either in the GROUP BY clause or within aggregate functions
  • Returns a clear SchemaError::GroupByColumnInvalid error when validation fails
  1. added a new SchemaError type to shows the error message - datafusion_common::SchemaError::GroupByColumnInvalid

In datafusion/sql/src/utils.rs:

  • Updated message_prefix() method to provide clearer error message prefixes
  • Improved help text in check_column_satisfies_expr() to suggest specific solutions
  • for easy understanding as requested in #15004

Are these changes tested?

  • The new validation functions in expand_wildcard_rule.rs have been tested with test cases included at the bottom of the file. These tests verify both successful validation and proper error reporting.
  • The message improvements in datafusion/sql/src/utils.rs modify only the error text, not the functional logic, so no additional tests were added for these changes.

Additional test cases are welcome to ensure comprehensive coverage.

Are there any user-facing changes?

Yes. Users will now see more intuitive error messages when they encounter GROUP BY validation issues. For example, instead of seeing:

Error during planning: Projection references non-aggregate values: Expression foo.b could not be resolved from available columns: foo.a

They will see:

Error during planning: Column in SELECT must be grouped or aggregated: Expression foo.b could not be resolved from available columns: foo.a

This change makes errors more understandable and provides clearer guidance on how to fix the issue.

@github-actions github-actions bot added sql SQL Planner optimizer Optimizer rules common Related to common crate labels Mar 6, 2025
@alamb alamb marked this pull request as draft March 7, 2025 22:08
@alamb
Copy link
Contributor

alamb commented Mar 7, 2025

The CI seems to be failing so marking as draft for now. Please mark it ready for review when ready for another look

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 13, 2025
@Jiashu-Hu Jiashu-Hu closed this Mar 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
common Related to common crate optimizer Optimizer rules sql SQL Planner sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved error for expand wildcard rule
2 participants