Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement date/time truncation functions #2660

Merged
merged 15 commits into from
Dec 9, 2024
Merged

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Dec 6, 2024

This implements the functions to truncate date/timestamp to some specific component, matching the Spark SQL function trunc and date_trunc.

Due to changes to related code, the existing module DateTimeRebase is changed to DateTimeUtils (without breaking), which will also contain the newly implemented function.

Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia marked this pull request as draft December 7, 2024 05:59
@ttnghia
Copy link
Collaborator Author

ttnghia commented Dec 7, 2024

build

@ttnghia ttnghia marked this pull request as ready for review December 7, 2024 07:38
Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a possible improvement that is minor.

* @param format The time component to truncate to
* @return The truncated date/time
*/
public static ColumnVector truncate(ColumnView datetime, ColumnView format) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have a version that supports a scalar format? I just think that will be the most common use case and if we can save memory and time by not exploding the scalar into a column I think that would be great.

I think that can be done as a follow on issue if you want.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, however, either parameter can be Scalar thus if doing so we will have to implement 4 JNI overloads for this function. Instead, the code supports columns in which any of them can have one row and can achieve the same performance.

In plugin code, I just "convert" Scalar into a column of one row. Such conversion should not be very expensive. But I can add more overload to support Scalar if that can help anything better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes either can be a scalar, but fmt is the one that is really common to be a scalar. Most of the time you want to truncate to a very specific value so you know how to deal with the resulting timestamp/date.

@ttnghia ttnghia merged commit d99dbf8 into NVIDIA:branch-25.02 Dec 9, 2024
4 checks passed
@ttnghia ttnghia deleted the trunc branch December 9, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants