Adding kernel to add parse_uri support for protocol extraction #1502

hyperbolic2346 · 2023-10-13T18:55:40Z

This adds support for parsing protocols from a uri on the GPU. This change models the kernel after url_decode, which dedicates a warp to a string. This is a better approach than a thread per byte due to removing all the lower_bound calls that hit global memory to figure out which row a byte belongs. This kernel copies the string from global memory into shared memory a block at a time for processing.

There is a potential optimization here by having the first kernel build a starting offset for each string. With that and the number of bytes the second kernel could turn into a series of cudaMemcpyAsync calls. This protocol kernel even starts at byte 0, so the starting offset is known. I don't know how this will fare as support for other extractions like HOST and PATH may add further complications. I am not sure if this will be a monolithic kernel or if it will be best to have a kernel per parse type yet.

This will evolve over time, but I don't know which direction yet so I have not done this optimization.

closes #1501

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 · 2023-10-13T19:06:09Z

build

src/main/cpp/benchmarks/CMakeLists.txt

jlowe

Are the JNI bindings for this new kernel being postponed to a followup PR?

src/main/cpp/benchmarks/CMakeLists.txt

src/main/cpp/src/parse_uri.cu

hyperbolic2346 · 2023-10-16T18:48:22Z

build

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 · 2023-10-16T19:14:40Z

Are the JNI bindings for this new kernel being postponed to a followup PR?

JNI bindings have been added.

hyperbolic2346 · 2023-10-16T19:14:45Z

build

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 · 2023-10-16T19:32:49Z

build

jlowe

Normally when adding the JNI we also add the Java bindings to leverage that JNI so we can test it via a Java unit test.

src/main/cpp/src/parse_uri.hpp

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 · 2023-10-17T14:21:25Z

build

src/main/java/com/nvidia/spark/rapids/jni/ParseURI.java

src/test/java/com/nvidia/spark/rapids/jni/ParseURITest.java

Co-authored-by: Jason Lowe <[email protected]>

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 · 2023-10-17T18:09:12Z

build

Ran across this issue during the review of NVIDIA/spark-rapids-jni#1502 and since the code was modeled after this code, I am pushing the fix here as well. Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #14290

hyperbolic2346 added 2 commits October 13, 2023 18:37

Adding parse protocol kernel

e7ea344

moving negative tests that spark accepts up to simple test

04f94c4

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 force-pushed the mwilson/parse-url-protocol branch from 19a69ff to 04f94c4 Compare October 13, 2023 18:56

hyperbolic2346 requested review from jlowe, revans2 and nvdbaranec October 13, 2023 19:12

hyperbolic2346 self-assigned this Oct 13, 2023

hyperbolic2346 added enhancement New feature or request feature request labels Oct 13, 2023

hyperbolic2346 commented Oct 13, 2023

View reviewed changes

src/main/cpp/benchmarks/CMakeLists.txt Outdated Show resolved Hide resolved

jlowe reviewed Oct 13, 2023

View reviewed changes

src/main/cpp/benchmarks/CMakeLists.txt Outdated Show resolved Hide resolved

src/main/cpp/src/parse_uri.cu Outdated Show resolved Hide resolved

ttnghia self-requested a review October 13, 2023 23:20

fixing potential overflow issue and fixing library

0e24910

hyperbolic2346 mentioned this pull request Oct 16, 2023

fixing thread index overflow issue rapidsai/cudf#14290

Merged

3 tasks

Adding JNI

4ad2034

Signed-off-by: Mike Wilson <[email protected]>

Fixing copyright year

db8fdb1

Signed-off-by: Mike Wilson <[email protected]>

jlowe reviewed Oct 16, 2023

View reviewed changes

ttnghia reviewed Oct 17, 2023

View reviewed changes

src/main/cpp/src/parse_uri.hpp Outdated Show resolved Hide resolved

ttnghia reviewed Oct 17, 2023

View reviewed changes

src/main/cpp/src/parse_uri.hpp Outdated Show resolved Hide resolved

hyperbolic2346 added 4 commits October 17, 2023 14:13

Fix non-empty null issue

524beb2

Signed-off-by: Mike Wilson <[email protected]>

Make error URIs more festive

6edf2ed

Signed-off-by: Mike Wilson <[email protected]>

Adding java-side test and binding

f33919a

Signed-off-by: Mike Wilson <[email protected]>

updates from review comments

463317e

Signed-off-by: Mike Wilson <[email protected]>

jlowe reviewed Oct 17, 2023

View reviewed changes

src/main/java/com/nvidia/spark/rapids/jni/ParseURI.java Outdated Show resolved Hide resolved

src/test/java/com/nvidia/spark/rapids/jni/ParseURITest.java Show resolved Hide resolved

src/test/java/com/nvidia/spark/rapids/jni/ParseURITest.java Outdated Show resolved Hide resolved

hyperbolic2346 and others added 2 commits October 17, 2023 12:53

Update src/main/java/com/nvidia/spark/rapids/jni/ParseURI.java

3a9dce3

Co-authored-by: Jason Lowe <[email protected]>

Update src/test/java/com/nvidia/spark/rapids/jni/ParseURITest.java

e40572d

Co-authored-by: Jason Lowe <[email protected]>

ttnghia previously approved these changes Oct 17, 2023

View reviewed changes

adding empty string and null test

4d11f07

Signed-off-by: Mike Wilson <[email protected]>

hyperbolic2346 dismissed ttnghia’s stale review via 4d11f07 October 17, 2023 18:08

jlowe approved these changes Oct 17, 2023

View reviewed changes

hyperbolic2346 merged commit b317bf3 into NVIDIA:branch-23.12 Oct 17, 2023
1 check passed

hyperbolic2346 deleted the mwilson/parse-url-protocol branch October 17, 2023 21:34

thirtiseven mentioned this pull request Oct 19, 2023

Use parse_url kernel for PROTOCOL parsing NVIDIA/spark-rapids#9481

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding kernel to add parse_uri support for protocol extraction #1502

Adding kernel to add parse_uri support for protocol extraction #1502

hyperbolic2346 commented Oct 13, 2023

hyperbolic2346 commented Oct 13, 2023

jlowe left a comment

hyperbolic2346 commented Oct 16, 2023

hyperbolic2346 commented Oct 16, 2023

hyperbolic2346 commented Oct 16, 2023

hyperbolic2346 commented Oct 16, 2023

jlowe left a comment

hyperbolic2346 commented Oct 17, 2023

hyperbolic2346 commented Oct 17, 2023

Adding kernel to add parse_uri support for protocol extraction #1502

Adding kernel to add parse_uri support for protocol extraction #1502

Conversation

hyperbolic2346 commented Oct 13, 2023

hyperbolic2346 commented Oct 13, 2023

jlowe left a comment

Choose a reason for hiding this comment

hyperbolic2346 commented Oct 16, 2023

hyperbolic2346 commented Oct 16, 2023

hyperbolic2346 commented Oct 16, 2023

hyperbolic2346 commented Oct 16, 2023

jlowe left a comment

Choose a reason for hiding this comment

hyperbolic2346 commented Oct 17, 2023

hyperbolic2346 commented Oct 17, 2023