Skip to content

Commit

Permalink
Fix the int32 overflow when computing page fragment sizes for large s…
Browse files Browse the repository at this point in the history
…tring columns (#16028)

This PR fixes the possible `int32` overflow when computing page fragment sizes for large (2B+ char) string columns.

Authors:
  - Muhammad Haseeb (https://github.com/mhaseeb123)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Nghia Truong (https://github.com/ttnghia)

URL: #16028
  • Loading branch information
mhaseeb123 authored Jun 14, 2024
1 parent 987879c commit 829b3a9
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions cpp/src/io/parquet/writer_impl.cu
Original file line number Diff line number Diff line change
Expand Up @@ -1763,10 +1763,10 @@ auto convert_table_to_parquet_data(table_input_metadata& table_meta,
// for multiple fragments per page to smooth things out. using 2 was too
// unbalanced in final page sizes, so using 4 which seems to be a good
// compromise at smoothing things out without getting fragment sizes too small.
auto frag_size_fn = [&](auto const& col, size_type col_size) {
auto frag_size_fn = [&](auto const& col, size_t col_size) {
int const target_frags_per_page = is_col_fixed_width(col) ? 1 : 4;
auto const avg_len =
target_frags_per_page * util::div_rounding_up_safe<size_type>(col_size, input.num_rows());
target_frags_per_page * util::div_rounding_up_safe<size_t>(col_size, input.num_rows());
if (avg_len > 0) {
auto const frag_size = util::div_rounding_up_safe<size_type>(max_page_size_bytes, avg_len);
return std::min<size_type>(max_page_fragment_size, frag_size);
Expand Down

0 comments on commit 829b3a9

Please sign in to comment.