Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved error for expand wildcard rule #15004

Closed
alamb opened this issue Mar 4, 2025 · 6 comments · Fixed by #15287
Closed

Improved error for expand wildcard rule #15004

alamb opened this issue Mar 4, 2025 · 6 comments · Fixed by #15287
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Mar 4, 2025

Is your feature request related to a problem or challenge?

Found by @mhilton upstream in InfluxDB IOx

Running this query, results in a confusing error message

create table foo(a int, b int, c timestamp) as values (1,2,'2025-01-01T12:01:02');

SELECT *
FROM foo
WHERE c >= NOW() - INTERVAL '1 hour'
GROUP BY a;
> SELECT *
FROM foo
WHERE c >= NOW() - INTERVAL '1 hour'
GROUP BY a;
expand_wildcard_rule
caused by
Schema error: No field named foo.b. Valid fields are foo.a.

If you explicitly list out the column (b instead of *) you get a better message:

> SELECT b
FROM foo
WHERE c >= NOW() - INTERVAL '1 hour'
GROUP BY a;
Error during planning: Projection references non-aggregate values: Expression foo.b could not be resolved from available columns: foo.a

Describe the solution you'd like

I would like a better error

Perhaps something like

While expanding wildcard, column "b" must appear in the GROUP BY clause or must be part of an aggregate function.

Bonus points for also improving the message Error during planning: Projection references non-aggregate values: Expression foo.b could not be resolved from available columns: foo.a

Describe alternatives you've considered

No response

Additional context

Here is what DuckDB reports

D SELECT *
  FROM foo
  WHERE c >= NOW() - INTERVAL '1 hour'
  GROUP BY a;
Binder Error:
column "b" must appear in the GROUP BY clause or must be part of an aggregate function.
Either add it to the GROUP BY list, or use "ANY_VALUE(b)" if the exact value of "b" is not important.

Here is what postgres says

postgres=# SELECT *
  FROM foo
  WHERE c >= NOW() - INTERVAL '1 hour'
  GROUP BY a;
ERROR:  column "foo.b" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT *
@alamb alamb added enhancement New feature or request good first issue Good for newcomers labels Mar 4, 2025
@alamb
Copy link
Contributor Author

alamb commented Mar 4, 2025

I think this is a good first issue as the code will be relatively simple and it will introduce people to the DataFusion dev process

@Jiashu-Hu
Copy link
Contributor

take

@Jiashu-Hu
Copy link
Contributor

hi @alamb
I've completed the logic part of this issue, but since this issue is about some error messages that will cause the CI process to fail, should I also modify the related test message?
Thank you very much

@alamb
Copy link
Contributor Author

alamb commented Mar 12, 2025

hi @alamb I've completed the logic part of this issue, but since this issue is about some error messages that will cause the CI process to fail, should I also modify the related test message? Thank you very much

Yes please!

@Jiashu-Hu
Copy link
Contributor

Hi @alamb, seems the file that I'm working on (expand_wildcard_rule.rs) just has been deleted in 16 hours ago by Remove expand wildcard rule #15170, this means refactoring my current code into the new structure will take a bit longer—LOL. I’ll close my existing pull request and submit a new PR once I’ve merged it with the updated logic. Sorry for waiting

@alamb
Copy link
Contributor Author

alamb commented Mar 13, 2025

Thanks for the update @Jiashu-Hu 🙏

I did verify that the error is still the same on main

DataFusion CLI v46.0.0
> create table foo(a int, b int, c timestamp) as values (1,2,'2025-01-01T12:01:02');

SELECT *
FROM foo
WHERE c >= NOW() - INTERVAL '1 hour'
GROUP BY a;
0 row(s) fetched.
Elapsed 0.025 seconds.

Error during planning: Projection references non-aggregate values: Expression foo.b could not be resolved from available columns: foo.a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants