Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Organize parquet reader mukernel non-nullable code, introduce manual block scans #16830

Merged

Conversation

pmattione-nvidia
Copy link
Contributor

@pmattione-nvidia pmattione-nvidia commented Sep 18, 2024

This is a collection of a few small optimizations and tweaks for the parquet reader fixed-width mukernels (flat & nested, lists not implemented yet). The benchmark changes are negligible, this is mainly cleanup and code in preparation for the upcoming list mukernel.

  1. If not reading the whole page (chunked reads) exit sooner

  2. By having each thread keep track of the current valid_count (and not saving-to or reading-from the nesting_info until the end), we don't need to synchronize the block threads as frequently, so these extra syncs are removed.

  3. For (non-list) nested columns that aren't nullable, we don't need to loop over the whole nesting depth; only the last level of nesting is used. After removing this loop, the non-nullable code for nested and flat hierarchies is identical, so they're extracted and consolidated into a new function.

  4. When doing block scans in the parquet reader we also need to know the per-warp results of the scan. Because cub doesn't return those, we then do an additional warp-wide ballot that is unnecessary. This introduces code that does a block scan manually, saving the intermediate results. However using this code in the flat & nested kernels uses 8 more registers, so it isn't used yet.

  5. By doing an exclusive-scan instead of an inclusive-scan, we don't need the extra "- 1's" that were everywhere.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@pmattione-nvidia pmattione-nvidia added libcudf Affects libcudf (C++/CUDA) code. Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 18, 2024
@pmattione-nvidia pmattione-nvidia self-assigned this Sep 18, 2024
@pmattione-nvidia pmattione-nvidia changed the base branch from branch-24.10 to branch-24.12 September 24, 2024 15:37
@pmattione-nvidia pmattione-nvidia marked this pull request as ready for review September 24, 2024 18:44
@pmattione-nvidia pmattione-nvidia requested a review from a team as a code owner September 24, 2024 18:44
Copy link
Contributor

@nvdbaranec nvdbaranec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass. Will absorb and come back for another.

cpp/src/io/parquet/decode_fixed.cu Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved
cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved
@vuule vuule requested a review from nvdbaranec October 1, 2024 18:16
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice set of changes!

cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved
@pmattione-nvidia pmattione-nvidia changed the title Optimize parquet reader mukernel block scans, non-nullable code Organize parquet reader mukernel non-nullable code, introduce manual block scans Oct 4, 2024
@vuule vuule added the 5 - Ready to Merge Testing and reviews complete, ready to merge label Oct 10, 2024
@pmattione-nvidia
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 891e5aa into rapidsai:branch-24.12 Oct 11, 2024
100 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

3 participants