-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] strided loop overflow observed while joining against build table with 1.2B rows #12108
Comments
@jrhemstad fyi |
@abellina In your repro example, auto it = cudf::detail::make_counting_transform_iterator(0,
[]__device__(std::size_t/* instead of cudf::size_type */ ix){
return ix;
}); It's the user's responsibility to make sure the iterator won't overflow when passing it to cuco. I think your solution (abellina@ea59ed1) is the right way to go and |
The issue here is this:
The user can't (or at least shouldn't have to) know that this logic is happening. More specifically: the user can't possibly know that the internals of a function are using their input and potentially doubling it (when (gridDim.x * block_size) / tile_size is also the size of the input). By changing the incoming iterator we're just tricking the template into getting it right. That's not an ideal solution.
|
For reference, this is just a slightly different flavor of: |
Since this is a counting transform iterator, the type being compared here
We can get around that by constructing the iterator ourselves with thrust, but I do still feel like it's odd that the caller should have to know that the range of iterator values might be. If that were the case, wouldn't you have to just always use |
The issue is persistent when using |
After giving it some more thought and conferring with my lawyers, I think this is a legit bug in cuco. When you have an iterator, The safer way to write this would be:
|
Today I learned. |
Yes @jrhemstad's suggestion doesn't overflow. I believe it's because That said, wouldn't this overflow once we go above 4B elements? I know we currently can't address this many elements in cuDF, but I am curious on how one might change the kernel to be robust for any input iterator. Should we cast |
This change looks great so far: NVIDIA/cuCollections#243, with my testing not turning up any issues. Will we be able to include it in 22.12? |
We are seeing a illegal memory access while joining with a large build table (1.2B keys). We see this overflow in two scenarios, inner and left semi. There may be other cases, but these were the two I ran into.
Because we are using
size_type
in ourcounting_transform_iterator
, the loops incuco::static_multimap::pair_count
andcuco::static_map::insert
can overflow if we have a large build side (we see it with 1.2B keys).I was able to work around this issue by finding the iterators that were in my query's execution path and constructing them manually using
int64_t
abellina@ea59ed1, but I do not know if this is the appropriate solution (or if the solution is to change cuco directly).This is a follow on from: #12058. Note that after fixing the overflows in this issue our query ran successfully.
Thanks @nvdbaranec for spending time to help me triage this.
C++ repro case, this one is based on the
pair_count
kernel but without all the extra pieces (https://github.com/NVIDIA/cuCollections/blob/dev/include/cuco/detail/static_multimap/kernels.cuh#L304).The text was updated successfully, but these errors were encountered: