-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hyper log log plus plus(HLL++) #2522
base: branch-25.02
Are you sure you want to change the base?
Conversation
b6f5cf5
to
526a61f
Compare
src/main/cpp/src/HLLPP.cu
Outdated
rmm::cuda_stream_view stream, | ||
rmm::device_async_resource_ref mr) | ||
{ | ||
CUDF_EXPECTS(precision >= 4 && precision <= 18, "HLL++ requires precision in range: [4, 18]"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use std::numeric_limits<>::digits
instead of hardcoded values 4
and 18
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/main/cpp/src/HLLPP.cu
Outdated
auto input_cols = std::vector<int64_t const*>(input_iter, input_iter + input.num_children()); | ||
auto d_inputs = cudf::detail::make_device_uvector_async(input_cols, stream, mr); | ||
auto result = cudf::make_numeric_column( | ||
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need such all-valid null mask? How about cudf::mask_state::UNALLOCATED
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested Spark behavior, for approx_count_distinct(null)
returns 0.
So the values in result column are always non-null
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant, if all rows are valid, we don't need to allocate a null mask.
BTW, we need to pass mr
to the returning column (but do not pass it to the intermediate vector/column).
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream); | |
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::UNALLOCATED, stream, mr); |
src/main/cpp/src/HLLPP.cu
Outdated
auto result = cudf::make_numeric_column( | ||
cudf::data_type{cudf::type_id::INT64}, input.size(), cudf::mask_state::ALL_VALID, stream); | ||
// evaluate from struct<long, ..., long> | ||
thrust::for_each_n(rmm::exec_policy(stream), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to use exec_policy_nosync
as much as possible.
thrust::for_each_n(rmm::exec_policy(stream), | |
thrust::for_each_n(rmm::exec_policy_nosync(stream), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
* The input sketch values must be given in the format `LIST<INT8>`. | ||
* | ||
* @param input The sketch column which constains `LIST<INT8> values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INT8
or INT64
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, in estimate_from_hll_sketches
I see that the input is STRUCT<LONG, LONG, ....>
instead of LIST<>
. Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's STRUCT<LONG, LONG, ....> consistent with Spark. The input is columnar data, e.g.: sketch 0 is composed of by all the data of the children at index 0.
Updated the function comments, refer to commit
Signed-off-by: Chong Gao <[email protected]>
Ready to review except test cases. |
src/main/cpp/CMakeLists.txt
Outdated
@@ -196,6 +196,7 @@ add_library( | |||
src/HashJni.cpp | |||
src/HistogramJni.cpp | |||
src/HostTableJni.cpp | |||
src/HLLPPJni.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to be generic.
src/HLLPPJni.cpp | |
src/AggregationJni.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to: HLLPPHostUDFJni
AggregationJni is too generic
src/main/cpp/CMakeLists.txt
Outdated
@@ -204,6 +205,7 @@ add_library( | |||
src/SparkResourceAdaptorJni.cpp | |||
src/SubStringIndexJni.cpp | |||
src/ZOrderJni.cpp | |||
src/HLLPP.cu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about HyperLogLogPP
?
src/HLLPP.cu | |
src/HyperLogLogPP.cu |
This name is also applied for the .hpp
and *.java
files.
src/main/cpp/src/HLLPP.cu
Outdated
@@ -0,0 +1,102 @@ | |||
/* | |||
* Copyright (c) 2023-2024, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright (c) 2023-2024, NVIDIA CORPORATION. | |
* Copyright (c) 2024-2025, NVIDIA CORPORATION. |
src/main/cpp/src/HLLPP.cu
Outdated
int64_t shift_mask = MASK << (REGISTER_VALUE_BITS * reg_idx); | ||
int64_t v = (long_10_registers & shift_mask) >> (REGISTER_VALUE_BITS * reg_idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int64_t shift_mask = MASK << (REGISTER_VALUE_BITS * reg_idx); | |
int64_t v = (long_10_registers & shift_mask) >> (REGISTER_VALUE_BITS * reg_idx); | |
auto const shift_bits = REGISTER_VALUE_BITS * reg_idx; | |
auto const shift_mask = MASK << shift_bits; | |
auto const v = (long_10_registers & shift_mask) >> shift_bit; |
src/main/cpp/src/HLLPP.cu
Outdated
} | ||
|
||
struct estimate_fn { | ||
cudf::device_span<int64_t const*> sketch_longs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudf::device_span<int64_t const*> sketch_longs; | |
cudf::device_span<int64_t const*> sketches; |
src/main/cpp/src/HLLPP.cu
Outdated
int const precision; | ||
int64_t* const out; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now favor non-const members so the functor can be moved by the compiler if needed.
In addition, member variables need to be sorted by their sizes to reduce padding.
int const precision; | |
int64_t* const out; | |
int64_t* out; | |
int precision; |
src/main/cpp/src/HLLPP.cu
Outdated
|
||
__device__ void operator()(cudf::size_type const idx) const | ||
{ | ||
auto const num_regs = 1ull << precision; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be used to compare with signed int later, thus it should not be unsigned here.
auto const num_regs = 1ull << precision; | |
auto const num_regs = 1 << precision; |
src/main/cpp/src/HLLPP.cu
Outdated
rmm::cuda_stream_view stream, | ||
rmm::device_async_resource_ref mr) | ||
{ | ||
CUDF_EXPECTS(precision >= 4, "HyperLogLogPlusPlus requires precision is bigger than 4."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CUDF_EXPECTS(precision >= 4, "HyperLogLogPlusPlus requires precision is bigger than 4."); | |
CUDF_EXPECTS(precision >= 4, "HyperLogLogPlusPlus requires precision bigger than 4."); |
src/main/cpp/src/HLLPP.cu
Outdated
auto const input_iter = cudf::detail::make_counting_transform_iterator( | ||
0, [&](int i) { return input.child(i).begin<int64_t>(); }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a CUDF_EXPECTS
to check for input type too (struct of longs).
src/main/cpp/src/HLLPP.cu
Outdated
auto input_cols = std::vector<int64_t const*>(input_iter, input_iter + input.num_children()); | ||
auto d_inputs = cudf::detail::make_device_uvector_async(input_cols, stream, mr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto input_cols = std::vector<int64_t const*>(input_iter, input_iter + input.num_children()); | |
auto d_inputs = cudf::detail::make_device_uvector_async(input_cols, stream, mr); | |
auto const h_input_ptrs = std::vector<int64_t const*>(input_iter, input_iter + input.num_children()); | |
auto const d_input_ptrs = cudf::detail::make_device_uvector_async(input_cols, stream, cudf::get_current_device_resource_ref()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudf::get_current_device_resource_ref()):
Why not use the mr passed in?
src/main/cpp/src/HLLPP.hpp
Outdated
|
||
#include <cudf/column/column.hpp> | ||
#include <cudf/column/column_view.hpp> | ||
#include <cudf/utilities/default_stream.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <cudf/utilities/default_stream.hpp> | |
#include <cudf/utilities/default_stream.hpp> | |
#include <cudf/utilities/memory_resource.hpp> |
src/main/cpp/src/HLLPP.hpp
Outdated
#include <cudf/utilities/default_stream.hpp> | ||
|
||
#include <rmm/cuda_stream_view.hpp> | ||
#include <rmm/resource_ref.hpp> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#include <rmm/resource_ref.hpp> |
src/main/cpp/src/HLLPP.hpp
Outdated
cudf::column_view const& input, | ||
int precision, | ||
rmm::cuda_stream_view stream = cudf::get_default_stream(), | ||
rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rmm::device_async_resource_ref mr = rmm::mr::get_current_device_resource()); | |
rmm::device_async_resource_ref mr = cudf::get_current_device_resource_ref()); |
/** | ||
* HyperLogLogPlusPlus | ||
*/ | ||
public class HLLPP { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public class HLLPP { | |
public class AggregationUtils { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AggregationUtils
is too generic, is HyperLogLogPlusPlusHostUDF
OK?
/** | ||
* Compute the approximate count distinct value from sketch values. | ||
* <p> | ||
* The input sketch values must be given in the format `Struct<INT64, INT64, ...>`, | ||
* The num of children is: num_registers_per_sketch / 10 + 1, here 10 means a INT64 contains | ||
* max 10 registers. Register value is 6 bits. The input is columnar data, e.g.: sketch 0 | ||
* is composed of by all the data of the children at index 0. | ||
* | ||
* @param input The sketch column which constains Struct<INT64, INT64, ...> values. | ||
* @param precision The num of bits for addressing. | ||
* @return A INT64 column with each value indicates the approximate count distinct value. | ||
*/ | ||
public static ColumnVector estimateDistinctValueFromSketches(ColumnView input, int precision) { | ||
return new ColumnVector(estimateDistinctValueFromSketches(input.getNativeView(), precision)); | ||
} | ||
|
||
private static native long estimateDistinctValueFromSketches(long inputHandle, int precision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if this Java interface will no longer be needed after converting the code to use HOST_UDF
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to: HyperLogLogPlusPlusHostUDF
It now is used to create UDF and do estimate JNI.
Now, this PR is using Host UDF. |
build |
Verified Host UDF successfully via NVIDIA/spark-rapids#11638 |
Need to wait for the dependencies to be merged first before we can build. |
int64_t const precision, // num of bits for register addressing, e.g.: 9 | ||
int* const registers_output_cache, // num is num_groups * num_registers_per_sketch | ||
int* const registers_thread_cache, // num is num_threads * num_registers_per_sketch | ||
cudf::size_type* const group_lables_thread_cache // save the group lables for each thread |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: labels?
Add support for Hyper log log plus plus(HLL++)
Depends on:
HOST_UDF
aggregation for groupby rapidsai/cudf#17592HOST_UDF
aggregation for reduction and segmented reduction rapidsai/cudf#17645Signed-off-by: Chong Gao [email protected]