Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support TruncDate expression #11804

Closed
kuhushukla opened this issue Dec 2, 2024 · 6 comments · Fixed by #11833
Closed

[FEA] Support TruncDate expression #11804

kuhushukla opened this issue Dec 2, 2024 · 6 comments · Fixed by #11833
Assignees
Labels
feature request New feature or request

Comments

@kuhushukla
Copy link
Collaborator

Is your feature request related to a problem? Please describe.

   ! <TruncDate> trunc(cast(my_field#860 as date), MM) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.TruncDate

Describe the solution you'd like
Customer query shows this fallback, timezone is UTC and GMT

Describe alternatives you've considered
None
Additional context
None

@kuhushukla kuhushukla added ? - Needs Triage Need team to review and classify feature request New feature or request labels Dec 2, 2024
@revans2
Copy link
Collaborator

revans2 commented Dec 2, 2024

@kuhushukla So is it only 'MM' ('MONTH') that is needed?

https://spark.apache.org/docs/latest/api/sql/index.html#date_trunc supports

  • YEAR
  • QUARTER
  • MONTH
  • WEEK
  • DAY
  • HOUR
  • MINUTE
  • SECOND
  • MILLISECOND
  • MICROSECOND

DAY and below we should be able to support with existing CUDF operators. Beyond that it is probably best if we write a custom kernel for most of it. We could come up with a way to put together a string representation of the timestamp we need by extracting year, month, day, etc and parsing it back with CUDF, but that feels really wasteful. I would rather just do it in a kernel ourselves based on the same kind of logic.

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Dec 3, 2024
@kuhushukla
Copy link
Collaborator Author

Currently I have only encountered month. I can update further in the next couple days, but MM would be a good start for the customer query in question.

@ttnghia
Copy link
Collaborator

ttnghia commented Dec 5, 2024

Fun facts:

  • trunc(date, fmt) is used for date type, while date_trunc(fmt, timestamp) is used for timestamp type.
  • They have opposite order of parameters.

I'm going to implement both functions, and support most formats for them except QUARTER and WEEK since these are not trivially computed.

@ttnghia
Copy link
Collaborator

ttnghia commented Dec 5, 2024

I changed my plan: will try to support all. This is a binary expression, and fmt can only be checked at run time, not planning time. As such, if we don't fully support it we won't be able to fallback but will throw exception.

@revans2
Copy link
Collaborator

revans2 commented Dec 5, 2024

@ttnghia it is perfectly acceptable for us to only support a scalar fmt, and in those cases we can check it a planning time. That is what we do for all of the regular expression commands.

In the type checks you can mark a type as TypeSig.lit

expr[RegExpReplace](
"String replace using a regular expression pattern",
ExprChecks.projectOnly(TypeSig.STRING, TypeSig.STRING,
Seq(ParamCheck("str", TypeSig.STRING, TypeSig.STRING),
ParamCheck("regex", TypeSig.lit(TypeEnum.STRING), TypeSig.STRING),
ParamCheck("rep", TypeSig.lit(TypeEnum.STRING), TypeSig.STRING),
ParamCheck("pos", TypeSig.lit(TypeEnum.INT)
.withPsNote(TypeEnum.INT, "only a value of 1 is supported"),
TypeSig.lit(TypeEnum.INT)))),
(a, conf, p, r) => new GpuRegExpReplaceMeta(a, conf, p, r)),

That makes it so we would fall back for anything that is not a literal value. Then you can use some pattern matching APIs to pull out the value.

def extractStringLit(exp: Expression): Option[String] = extractLit(exp) match {
case Some(Literal(v: UTF8String, StringType)) =>
val s = if (v == null) null else v.toString
Some(s)
case _ => None
}

@ttnghia
Copy link
Collaborator

ttnghia commented Dec 6, 2024

That is fine. I've completed the implementation to support all, as it is fairly simple.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants