Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run get_json_object_multiple_paths with thread-parallel kernel based on number of paths #2256

Closed

Conversation

ttnghia
Copy link
Collaborator

@ttnghia ttnghia commented Jul 24, 2024

Currently, either the thread-parallel or warp-parallel kernel is executed based on the input row size. This changes the condition to select which kernel to launch based on the number of JSON paths instead, which can produce better performance when having multiple paths.

@ttnghia ttnghia requested a review from revans2 July 24, 2024 23:37
@ttnghia ttnghia self-assigned this Jul 24, 2024
@ttnghia
Copy link
Collaborator Author

ttnghia commented Jul 25, 2024

build

@ttnghia
Copy link
Collaborator Author

ttnghia commented Jul 25, 2024

Tested the threshold on switching between thread-parallel vs warp parallel, with a small (fingerprint) dataset:

thread par, max path size:
2: 40s
4: 17s
6: 15s
8: 11.8s
10: 9.3s
16: 7.6s
32: 5.9s

warp par, max path size:
2: 19s
4: 13s
6: 13s
8: 12s
10: 11.5s
16: 10.6s
32: 10.2s

As from the tests, warp-parallel kernel is faster when the number of paths is less than 8. Thus, we will switch to thread-parallel kernel from 8 paths.

ttnghia added 4 commits July 24, 2024 22:46
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
This reverts commit 2432f33.
Signed-off-by: Nghia Truong <[email protected]>
@ttnghia ttnghia force-pushed the change_thread_parallel_condition branch from c46e33d to a580453 Compare July 25, 2024 05:56
@ttnghia
Copy link
Collaborator Author

ttnghia commented Jul 25, 2024

build

Copy link
Collaborator

@revans2 revans2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers look better for all of the GPUs I have tested it on. I want to do a bit more testing on some other datasets.

Also could we try a hybrid approach where we have the threads in a warp dedicated to a single row up to 32 paths. So if there are 2 paths 2 of the threads would be active. If there were 32 paths, all of them would be active? The main reason for this is that I see a huge performance drop off between 7 and 8 paths, but also if I don't use powers of 2 for the number of paths, then I get kind of sporadic performance results.

parallelism scaling

@ttnghia
Copy link
Collaborator Author

ttnghia commented Jul 25, 2024

Close this as it is no longer needed. Instead, it is replaced by #2258.

@ttnghia ttnghia closed this Jul 25, 2024
@ttnghia ttnghia deleted the change_thread_parallel_condition branch July 25, 2024 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants