Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parse '1'::interval as month by default #11454

Closed
wants to merge 4 commits into from

Conversation

nix010
Copy link

@nix010 nix010 commented Jul 13, 2024

Which issue does this PR close?

Closes #11271 .

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Jul 13, 2024
@nix010
Copy link
Author

nix010 commented Jul 13, 2024

My approach to this problem maybe a bit naive! open to suggestions.
Also where should I put the tests for this ?

use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
    // register the table
    let ctx = SessionContext::new();
    ctx.register_csv("example", "test.csv", CsvReadOptions::new()).await?;

    // create a plan to run a SQL query
    let df = ctx.sql("SELECT '-12'::interval, '1'::interval, '1 day 2 month 3 hour'::interval FROM example").await?;

    // execute and print results
    df.show().await?;
    Ok(())
}
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
| Utf8("-12")                                             | Utf8("1")                                             | Utf8("1 day 2 month 3 hour")                          |
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+
| 0 years 0 mons 0 days 0 hours 0 mins -12.000000000 secs | 0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs | 0 years 2 mons 1 days 3 hours 0 mins 0.000000000 secs |
+---------------------------------------------------------+-------------------------------------------------------+-------------------------------------------------------+

@nix010 nix010 changed the title Fix parse interval as month by default Fix parse '1'::interval as month by default Jul 13, 2024
@alamb
Copy link
Contributor

alamb commented Jul 15, 2024

Also where should I put the tests for this ?

I recommend sqllogictests

Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest

Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files

} else {
// Arrow by default will parse str as Month for unit MonthDayNano.
// So we need to be explict that we want it to parse as second.
match (scalar, cast_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one issue with this approach is that it will only work for single values like '1'::interval

the behavior of

create table foo as values ('1');
select column1::interval from foo;

will not be affected

I recommend you update the code in kernels::cast to have the new semantics you have in mind

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but as my understand is that kernels::cast is within arrow-cast which it a different repo. So you mean this change is should happen in arrow-cast repo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, eventally the fix should make it up to arrow-cast

In the interim, you could make a wrapper function in datafusion with the same signature as cast in arrow and implement your logic there for interval and fall through to the kernel in arrow-cast otherwise

I haven't though through the implications of changing the semantics of casting strings to intervals to be honest

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jul 17, 2024
@nix010 nix010 requested a review from alamb July 17, 2024 16:14
@samuelcolvin
Copy link
Contributor

@nix010 we really need this, any chance you could take a look at failing tests, otherwise I'd be happy to have a go at it.

None
}

fn _kernels_cast_with_option(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe useful to add a docstring here explaining why this method is necessary / what it does.

Comment on lines +248 to +249
// Arrow by default will parse str as Month for unit MonthDayNano.
// So we need to be explict that we want it to parse as second.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also fix this in arrow? @alamb

@@ -413,95 +413,47 @@ select '1980-01-01'::timestamp - interval '1 day'


### date / timestamp (array) + interval (scalar)
query D
query error DataFusion error: Error during planning: table 'datafusion\.public\.t' not found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should definitely work, not raise an error!

@@ -623,5 +547,21 @@ select
----
true true true true true true

### interval with string literal default as second
query ????
SELECT interval '1', '-12'::interval, '1'::interval, '1'::interval + '1'::interval
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is very hard to read, better to do separate tests.

@samuelcolvin
Copy link
Contributor

I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs.

@nix010
Copy link
Author

nix010 commented Aug 8, 2024

I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs.

Ok, then I think this PR can be closed.

1 similar comment
@nix010
Copy link
Author

nix010 commented Aug 8, 2024

I think this is the wrong approach, it would be better to fix the underlying implementation in arrow-rs.

Ok, then I think this PR can be closed.

@samuelcolvin
Copy link
Contributor

See apache/arrow-rs#6211

@alamb alamb marked this pull request as draft August 8, 2024 16:08
@alamb
Copy link
Contributor

alamb commented Aug 8, 2024

Marking as draft to signify this PR isn't waiting on feedback anymore

Copy link

github-actions bot commented Oct 8, 2024

Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale PR has not had any activity for some time label Oct 8, 2024
@github-actions github-actions bot closed this Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) Stale PR has not had any activity for some time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

interval cast has some problematic behaviour
3 participants