-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement date/time truncation functions #2660
Conversation
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a possible improvement that is minor.
* @param format The time component to truncate to | ||
* @return The truncated date/time | ||
*/ | ||
public static ColumnVector truncate(ColumnView datetime, ColumnView format) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have a version that supports a scalar format? I just think that will be the most common use case and if we can save memory and time by not exploding the scalar into a column I think that would be great.
I think that can be done as a follow on issue if you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about that, however, either parameter can be Scalar thus if doing so we will have to implement 4 JNI overloads for this function. Instead, the code supports columns in which any of them can have one row and can achieve the same performance.
In plugin code, I just "convert" Scalar into a column of one row. Such conversion should not be very expensive. But I can add more overload to support Scalar if that can help anything better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes either can be a scalar, but fmt is the one that is really common to be a scalar. Most of the time you want to truncate to a very specific value so you know how to deal with the resulting timestamp/date.
This implements the functions to truncate date/timestamp to some specific component, matching the Spark SQL function
trunc
anddate_trunc
.Due to changes to related code, the existing module
DateTimeRebase
is changed toDateTimeUtils
(without breaking), which will also contain the newly implemented function.