Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parse '1'::interval as month by default #11454

Closed
wants to merge 4 commits into from
Closed
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 32 additions & 12 deletions datafusion/expr/src/columnar_value.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
use arrow::array::ArrayRef;
use arrow::array::NullArray;
use arrow::compute::{kernels, CastOptions};
use arrow::datatypes::{DataType, TimeUnit};
use arrow::datatypes::{DataType, IntervalUnit, TimeUnit};
use datafusion_common::format::DEFAULT_CAST_OPTIONS;
use datafusion_common::{internal_err, Result, ScalarValue};
use std::sync::Arc;
Expand Down Expand Up @@ -195,19 +195,39 @@ impl ColumnarValue {
kernels::cast::cast_with_options(array, cast_type, &cast_options)?,
)),
ColumnarValue::Scalar(scalar) => {
let scalar_array =
if cast_type == &DataType::Timestamp(TimeUnit::Nanosecond, None) {
if let ScalarValue::Float64(Some(float_ts)) = scalar {
ScalarValue::Int64(Some(
(float_ts * 1_000_000_000_f64).trunc() as i64,
))
.to_array()?
} else {
scalar.to_array()?
}
let scalar_array = if cast_type
== &DataType::Timestamp(TimeUnit::Nanosecond, None)
{
if let ScalarValue::Float64(Some(float_ts)) = scalar {
ScalarValue::Int64(Some(
(float_ts * 1_000_000_000_f64).trunc() as i64
))
.to_array()?
} else {
scalar.to_array()?
};
}
} else {
// Arrow by default will parse str as Month for unit MonthDayNano.
// So we need to be explict that we want it to parse as second.
match (scalar, cast_type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one issue with this approach is that it will only work for single values like '1'::interval

the behavior of

create table foo as values ('1');
select column1::interval from foo;

will not be affected

I recommend you update the code in kernels::cast to have the new semantics you have in mind

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but as my understand is that kernels::cast is within arrow-cast which it a different repo. So you mean this change is should happen in arrow-cast repo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, eventally the fix should make it up to arrow-cast

In the interim, you could make a wrapper function in datafusion with the same signature as cast in arrow and implement your logic there for interval and fall through to the kernel in arrow-cast otherwise

I haven't though through the implications of changing the semantics of casting strings to intervals to be honest

(
ScalarValue::Utf8(Some(s_val)),
&DataType::Interval(IntervalUnit::MonthDayNano),
) => {
// negative case
let start_idx = if s_val.starts_with('-') { 1 } else { 0 };

// only incase of simple input that all char is a number.
if s_val[start_idx..].chars().all(|c| c.is_ascii_digit()) {
ScalarValue::Utf8(Some(format!("{} second", s_val)))
.to_array()
} else {
scalar.to_array()
}
}
(_, _) => scalar.to_array(),
}?
};
let cast_array = kernels::cast::cast_with_options(
&scalar_array,
cast_type,
Expand Down