-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HybridParquetScan: Fix velox runtime error in hybrid scan when filter timestamp #12112
HybridParquetScan: Fix velox runtime error in hybrid scan when filter timestamp #12112
Conversation
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that we are missing a lot of tests around predicate push down for various data types that we say we support. Can you please file a follow on issue to test predicate push down for all data types that we support?
byte, short, int, long, float, double, date, decimal_32, decimal_64, string, struct, array, ...
I also see that we support a large number of expressions.
spark-rapids/sql-plugin/src/main/scala/org/apache/spark/rapids/hybrid/HybridExecutionUtils.scala
Lines 138 to 295 in 9db0338
val ansiOn = Seq( | |
classOf[Acos], | |
classOf[Acosh], | |
classOf[AddMonths], | |
classOf[Alias], | |
classOf[And], | |
classOf[ArrayAggregate], | |
classOf[ArrayContains], | |
classOf[ArrayDistinct], | |
classOf[ArrayExcept], | |
classOf[ArrayExists], | |
classOf[ArrayForAll], | |
classOf[ArrayIntersect], | |
classOf[ArrayJoin], | |
classOf[ArrayMax], | |
classOf[ArrayMin], | |
classOf[ArrayPosition], | |
classOf[ArrayRemove], | |
classOf[ArrayRepeat], | |
classOf[ArraySort], | |
classOf[ArraysZip], | |
classOf[Ascii], | |
classOf[Asin], | |
classOf[Asinh], | |
classOf[Atan], | |
classOf[Atan2], | |
classOf[Atanh], | |
classOf[AttributeReference], | |
classOf[BitLength], | |
classOf[BitwiseAnd], | |
classOf[BitwiseOr], | |
classOf[Cbrt], | |
classOf[Ceil], | |
classOf[Chr], | |
classOf[Concat], | |
classOf[Cos], | |
classOf[Cosh], | |
classOf[Crc32], | |
classOf[CreateNamedStruct], | |
classOf[DateAdd], | |
classOf[DateDiff], | |
classOf[DateFormatClass], | |
classOf[DateFromUnixDate], | |
classOf[DateSub], | |
classOf[DayOfMonth], | |
classOf[DayOfWeek], | |
classOf[DayOfYear], | |
classOf[ElementAt], | |
classOf[EqualNullSafe], | |
classOf[EqualTo], | |
classOf[Exp], | |
classOf[Expm1], | |
classOf[FindInSet], | |
classOf[Flatten], | |
classOf[Floor], | |
classOf[FromUTCTimestamp], | |
classOf[FromUnixTime], | |
classOf[GetJsonObject], | |
classOf[GetMapValue], | |
classOf[GreaterThan], | |
classOf[GreaterThanOrEqual], | |
classOf[Greatest], | |
classOf[Hex], | |
classOf[Hour], | |
classOf[If], | |
classOf[In], | |
classOf[IsNaN], | |
classOf[IsNotNull], | |
classOf[IsNull], | |
classOf[LastDay], | |
classOf[Least], | |
classOf[Left], | |
classOf[Length], | |
classOf[LengthOfJsonArray], | |
classOf[LessThan], | |
classOf[Levenshtein], | |
classOf[Like], | |
classOf[Literal], | |
classOf[Log], | |
classOf[Log10], | |
classOf[Log2], | |
classOf[Lower], | |
classOf[MapFromArrays], | |
classOf[MapKeys], | |
classOf[MapValues], | |
classOf[MapZipWith], | |
classOf[Md5], | |
classOf[MicrosToTimestamp], | |
classOf[MillisToTimestamp], | |
classOf[Minute], | |
classOf[MonotonicallyIncreasingID], | |
classOf[Month], | |
classOf[NaNvl], | |
classOf[NextDay], | |
classOf[Not], | |
classOf[Or], | |
classOf[Overlay], | |
classOf[Pi], | |
classOf[Pow], | |
classOf[Quarter], | |
classOf[Rand], | |
classOf[Remainder], | |
classOf[Reverse], | |
classOf[Rint], | |
classOf[Round], | |
classOf[Second], | |
classOf[Sha1], | |
classOf[Sha2], | |
classOf[ShiftLeft], | |
classOf[ShiftRight], | |
classOf[Shuffle], | |
classOf[Sin], | |
classOf[Size], | |
classOf[SortArray], | |
classOf[SoundEx], | |
classOf[SparkPartitionID], | |
classOf[Sqrt], | |
classOf[Stack], | |
classOf[StringInstr], | |
classOf[StringLPad], | |
classOf[StringRPad], | |
classOf[StringRepeat], | |
classOf[StringReplace], | |
classOf[StringToMap], | |
classOf[StringTrim], | |
classOf[StringTrimLeft], | |
classOf[StringTrimRight], | |
classOf[Substring], | |
classOf[SubstringIndex], | |
classOf[Tan], | |
classOf[Tanh], | |
classOf[ToDegrees], | |
classOf[ToRadians], | |
classOf[ToUnixTimestamp], | |
classOf[UnaryPositive], | |
classOf[Unhex], | |
classOf[UnixMicros], | |
classOf[UnixMillis], | |
classOf[UnixSeconds], | |
classOf[Upper], | |
classOf[Uuid], | |
classOf[WeekDay], | |
classOf[WeekOfYear], | |
classOf[WidthBucket], | |
classOf[Year], | |
classOf[ZipWith] | |
) | |
// Some functions are fully supported with ANSI mode off | |
lazy val ansiOff = Seq( | |
classOf[Remainder], | |
classOf[Multiply], | |
classOf[Add], | |
classOf[Subtract], | |
classOf[Divide], | |
classOf[Abs], | |
classOf[Pmod] | |
) |
But I don't see hardly any test coverage for these expressions as a part of predicate push down.
I don't fee comfortable saying we support something with no tests for it. Could you file an issue to add in at least some basic tests for each of these expressions too? I am okay if it is the same issue as above.
@@ -227,3 +227,19 @@ def udf_fallback(s): | |||
assert_gpu_and_cpu_are_equal_collect( | |||
lambda spark: spark.read.parquet(data_path).filter("ascii(a) >= 50 and udf_fallback(a) = 'udf_100'"), | |||
conf=filter_split_conf) | |||
|
|||
@allow_non_gpu(*non_utc_allow) | |||
def test_hybrid_parquet_filter_pushdown_timestamp(spark_tmp_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be marked as a hybrid test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, thanks
Signed-off-by: Haoyang Li <[email protected]>
build |
Follow-up issue filed #12124 |
build |
rekick, part of the ci hung due to HW issue |
…n filter timestamp (NVIDIA#12112)" This reverts commit c696b7e.
Fix #12111
Timestamp is not fully supported in Hybrid Filter, so conditions with Timestamp should not pushed to Hybrid Scan in filter splitter.
This pr fixes it.