Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding kernel to add parse_uri support for protocol extraction #1502

Merged

Conversation

hyperbolic2346
Copy link
Collaborator

This adds support for parsing protocols from a uri on the GPU. This change models the kernel after url_decode, which dedicates a warp to a string. This is a better approach than a thread per byte due to removing all the lower_bound calls that hit global memory to figure out which row a byte belongs. This kernel copies the string from global memory into shared memory a block at a time for processing.

There is a potential optimization here by having the first kernel build a starting offset for each string. With that and the number of bytes the second kernel could turn into a series of cudaMemcpyAsync calls. This protocol kernel even starts at byte 0, so the starting offset is known. I don't know how this will fare as support for other extractions like HOST and PATH may add further complications. I am not sure if this will be a monolithic kernel or if it will be best to have a kernel per parse type yet.

This will evolve over time, but I don't know which direction yet so I have not done this optimization.

closes #1501

@hyperbolic2346 hyperbolic2346 force-pushed the mwilson/parse-url-protocol branch from 19a69ff to 04f94c4 Compare October 13, 2023 18:56
@hyperbolic2346
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the JNI bindings for this new kernel being postponed to a followup PR?

src/main/cpp/benchmarks/CMakeLists.txt Outdated Show resolved Hide resolved
src/main/cpp/src/parse_uri.cu Outdated Show resolved Hide resolved
@ttnghia ttnghia self-requested a review October 13, 2023 23:20
@hyperbolic2346
Copy link
Collaborator Author

build

Signed-off-by: Mike Wilson <[email protected]>
@hyperbolic2346
Copy link
Collaborator Author

Are the JNI bindings for this new kernel being postponed to a followup PR?

JNI bindings have been added.

@hyperbolic2346
Copy link
Collaborator Author

build

Signed-off-by: Mike Wilson <[email protected]>
@hyperbolic2346
Copy link
Collaborator Author

build

Copy link
Member

@jlowe jlowe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally when adding the JNI we also add the Java bindings to leverage that JNI so we can test it via a Java unit test.

@hyperbolic2346
Copy link
Collaborator Author

build

ttnghia
ttnghia previously approved these changes Oct 17, 2023
@hyperbolic2346
Copy link
Collaborator Author

build

@hyperbolic2346 hyperbolic2346 merged commit b317bf3 into NVIDIA:branch-23.12 Oct 17, 2023
1 check passed
@hyperbolic2346 hyperbolic2346 deleted the mwilson/parse-url-protocol branch October 17, 2023 21:34
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Oct 24, 2023
Ran across this issue during the review of NVIDIA/spark-rapids-jni#1502 and since the code was modeled after this code, I am pushing the fix here as well.

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - David Wendt (https://github.com/davidwendt)
  - Nghia Truong (https://github.com/ttnghia)

URL: #14290
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement PROTOCOL url parsing as part of parse_url support
3 participants