-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parse '1'::interval
as month by default
#11454
Conversation
My approach to this problem maybe a bit naive! open to suggestions. use datafusion::prelude::*;
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
// register the table
let ctx = SessionContext::new();
ctx.register_csv("example", "test.csv", CsvReadOptions::new()).await?;
// create a plan to run a SQL query
let df = ctx.sql("SELECT '-12'::interval, '1'::interval, '1 day 2 month 3 hour'::interval FROM example").await?;
// execute and print results
df.show().await?;
Ok(())
} +---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
| Utf8("-12") | Utf8("1") | Utf8("1 day 2 month 3 hour") |
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins -12.000000000 secs | 0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs | 0 years 2 mons 1 days 3 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+ |
'1'::interval
as month by default
I recommend sqllogictests Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files |
} else { | ||
// Arrow by default will parse str as Month for unit MonthDayNano. | ||
// So we need to be explict that we want it to parse as second. | ||
match (scalar, cast_type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one issue with this approach is that it will only work for single values like '1'::interval
the behavior of
create table foo as values ('1');
select column1::interval from foo;
will not be affected
I recommend you update the code in kernels::cast
to have the new semantics you have in mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but as my understand is that kernels::cast
is within arrow-cast
which it a different repo. So you mean this change is should happen in arrow-cast
repo ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, eventally the fix should make it up to arrow-cast
In the interim, you could make a wrapper function in datafusion with the same signature as cast
in arrow and implement your logic there for interval and fall through to the kernel in arrow-cast otherwise
I haven't though through the implications of changing the semantics of casting strings to intervals to be honest
@nix010 we really need this, any chance you could take a look at failing tests, otherwise I'd be happy to have a go at it. |
None | ||
} | ||
|
||
fn _kernels_cast_with_option( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe useful to add a docstring here explaining why this method is necessary / what it does.
// Arrow by default will parse str as Month for unit MonthDayNano. | ||
// So we need to be explict that we want it to parse as second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also fix this in arrow? @alamb
@@ -413,95 +413,47 @@ select '1980-01-01'::timestamp - interval '1 day' | |||
|
|||
|
|||
### date / timestamp (array) + interval (scalar) | |||
query D | |||
query error DataFusion error: Error during planning: table 'datafusion\.public\.t' not found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should definitely work, not raise an error!
@@ -623,5 +547,21 @@ select | |||
---- | |||
true true true true true true | |||
|
|||
### interval with string literal default as second | |||
query ???? | |||
SELECT interval '1', '-12'::interval, '1'::interval, '1'::interval + '1'::interval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very hard to read, better to do separate tests.
I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs. |
Ok, then I think this PR can be closed. |
1 similar comment
Ok, then I think this PR can be closed. |
Marking as draft to signify this PR isn't waiting on feedback anymore |
Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. |
Which issue does this PR close?
Closes #11271 .
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?