-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use parse_url kernel for HOST parsing #9845
Conversation
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
…park-rapids into thirtiseven-parse_url_protocol
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/urlFunctions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/urlFunctions.scala
Outdated
Show resolved
Hide resolved
parts match { | ||
case partScalar: GpuScalar => | ||
GpuColumnVector.from(doColumnar(urls, partScalar), dataType) | ||
case _ => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious in which case we will hit this exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
columnarEvalAny
will return Any
, so it is possible that there is something other than GpuScalar
in the batch.
An example from GpuExpressions.scala:
spark-rapids/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuExpressions.scala
Lines 309 to 328 in c65c5b8
override def columnarEval(batch: ColumnarBatch): GpuColumnVector = { | |
withResourceIfAllowed(left.columnarEvalAny(batch)) { lhs => | |
withResourceIfAllowed(right.columnarEvalAny(batch)) { rhs => | |
(lhs, rhs) match { | |
case (l: GpuColumnVector, r: GpuColumnVector) => | |
GpuColumnVector.from(doColumnar(l, r), dataType) | |
case (l: GpuScalar, r: GpuColumnVector) => | |
GpuColumnVector.from(doColumnar(l, r), dataType) | |
case (l: GpuColumnVector, r: GpuScalar) => | |
GpuColumnVector.from(doColumnar(l, r), dataType) | |
case (l: GpuScalar, r: GpuScalar) => | |
GpuColumnVector.from(doColumnar(batch.numRows(), l, r), dataType) | |
case (l, r) => | |
throw new UnsupportedOperationException(s"Unsupported data '($l: " + | |
s"${l.getClass}, $r: ${r.getClass})' for GPU binary expression.") | |
} | |
} | |
} | |
} | |
} |
In the real world, I think this case won't be hit if we do the right thing in GpuOverrides
, the exception here is more for defensive purpose.
GpuColumnVector.from(doColumnar(urlCv, partScalar, keyScalar), dataType) | ||
case _ => | ||
throw new | ||
UnsupportedOperationException(s"Cannot columnar evaluate expression: $this") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto.
"http://www.example.com/wpstyle/?p=364", | ||
"https://www.example.com/foo/?bar=baz&inga=42&quux", | ||
"http://✪df.ws/123", | ||
// "http://userid:[email protected]:8080", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these if we don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are failing tests now, will uncomment them after the kernel fixes these cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be working now.
@@ -0,0 +1,198 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems duplicated to url_test.py
. We can just keep url_test.py
which can be run in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is just make it easy to test for now. Will remove it before it's ready for review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
Signed-off-by: Haoyang Li <[email protected]>
build |
Signed-off-by: Haoyang Li <[email protected]>
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/GpuParseUrl.scala
Outdated
Show resolved
Hide resolved
// return a null columnvector | ||
return ColumnVector.fromStrings(null, null) | ||
} | ||
throw new UnsupportedOperationException(s"$this only supports partToExtract = PROTOCOL") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The message seems to be vague here. We say we are supporting PROTOCOL and HOST so here needs more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, done.
} | ||
} else { | ||
// 3-arg, i.e. QUERY with key | ||
assert(children.size == 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about checking and throwing IllegalArgumentException
on object constructor (object GpuParseUrl#apply()
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the assert here is unlikely to fail because there is an argument checking for it in GpuOverrides
.
Signed-off-by: Haoyang Li <[email protected]>
dependent PROTOCOL PR is merged, so this one is ready to merge too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
build |
Contributes to #8963
This PR add plugin support for HOST parsing in parse_url.
Depends on JNI PR: NVIDIA/spark-rapids-jni#1569
Depends on: #9481