-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run get_json_object_multiple_paths
with thread-parallel kernel based on number of paths
#2256
Run get_json_object_multiple_paths
with thread-parallel kernel based on number of paths
#2256
Conversation
Signed-off-by: Nghia Truong <[email protected]>
build |
Tested the threshold on switching between thread-parallel vs warp parallel, with a small (fingerprint) dataset:
As from the tests, warp-parallel kernel is faster when the number of paths is less than |
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
This reverts commit 2432f33.
Signed-off-by: Nghia Truong <[email protected]>
c46e33d
to
a580453
Compare
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The numbers look better for all of the GPUs I have tested it on. I want to do a bit more testing on some other datasets.
Also could we try a hybrid approach where we have the threads in a warp dedicated to a single row up to 32 paths. So if there are 2 paths 2 of the threads would be active. If there were 32 paths, all of them would be active? The main reason for this is that I see a huge performance drop off between 7 and 8 paths, but also if I don't use powers of 2 for the number of paths, then I get kind of sporadic performance results.
Close this as it is no longer needed. Instead, it is replaced by #2258. |
Currently, either the thread-parallel or warp-parallel kernel is executed based on the input row size. This changes the condition to select which kernel to launch based on the number of JSON paths instead, which can produce better performance when having multiple paths.