Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add
HiveHash
support for nested types #2534Add
HiveHash
support for nested types #2534Changes from 3 commits
12e83e7
a707425
215d0fb
8afa2a1
5404d5f
058fd47
e7749c8
e50be83
00d27a7
d2f536c
3b210a2
6653066
ae903f0
eb63abc
e20792e
994ec34
90a4aae
ccdd64b
1d6c16b
bf624db
3e9924f
ced2632
cccdb8c
7b4d647
ebdc2dd
80688ec
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of storing a column in each stack element, you can first flatten the input nested column into a table (array of columns), then here just store the index of the column in that array. See
cudf::experimental::row::lexicographic::preprocessed_table
for example of such flatten table.Some preprocessing is needed so we can retrieve the index of the children columns in the array of flattened table. My initial idea is to flatten the columns level-by-level, then maintain a
first_child_index
array along with anum_children
array. For example, with inputSTRUCT<INT, STRUCT<INT, STRING>>
:Then you can start iterating through the columns starting with
col_index
from0
.By using
int
to store column index instead ofcolumn_device_view
, we can reduce memory usage significantly thus increasing the stack size a lot.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check out
cudf/cpp/src/table/row_operators.cu
(cudf::experimental::row::lexicographic::preprocessed_table::create
) andcudf/cpp/src/structs/utilities.cpp
(cudf::structs::detail::flatten_nested_columns
) for example code of flattening table.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach may require significant changes to the existing framework, as the calculations for these 5 columns are not independent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah! But as discussed, I'm fine to defer this for the follow-up work, so we can have this for 24.12.