-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding kernel to add parse_uri support for protocol extraction #1502
Adding kernel to add parse_uri support for protocol extraction #1502
Conversation
Signed-off-by: Mike Wilson <[email protected]>
19a69ff
to
04f94c4
Compare
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the JNI bindings for this new kernel being postponed to a followup PR?
build |
Signed-off-by: Mike Wilson <[email protected]>
JNI bindings have been added. |
build |
Signed-off-by: Mike Wilson <[email protected]>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally when adding the JNI we also add the Java bindings to leverage that JNI so we can test it via a Java unit test.
Signed-off-by: Mike Wilson <[email protected]>
Signed-off-by: Mike Wilson <[email protected]>
Signed-off-by: Mike Wilson <[email protected]>
Signed-off-by: Mike Wilson <[email protected]>
build |
Co-authored-by: Jason Lowe <[email protected]>
Co-authored-by: Jason Lowe <[email protected]>
Signed-off-by: Mike Wilson <[email protected]>
build |
Ran across this issue during the review of NVIDIA/spark-rapids-jni#1502 and since the code was modeled after this code, I am pushing the fix here as well. Authors: - Mike Wilson (https://github.com/hyperbolic2346) Approvers: - David Wendt (https://github.com/davidwendt) - Nghia Truong (https://github.com/ttnghia) URL: #14290
This adds support for parsing protocols from a uri on the GPU. This change models the kernel after
url_decode
, which dedicates a warp to a string. This is a better approach than a thread per byte due to removing all thelower_bound
calls that hit global memory to figure out which row a byte belongs. This kernel copies the string from global memory into shared memory a block at a time for processing.There is a potential optimization here by having the first kernel build a starting offset for each string. With that and the number of bytes the second kernel could turn into a series of
cudaMemcpyAsync
calls. This protocol kernel even starts at byte 0, so the starting offset is known. I don't know how this will fare as support for other extractions like HOST and PATH may add further complications. I am not sure if this will be a monolithic kernel or if it will be best to have a kernel per parse type yet.This will evolve over time, but I don't know which direction yet so I have not done this optimization.
closes #1501