Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] JSON host tree algorithms #16545

Merged
merged 56 commits into from
Sep 26, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
9eaacb3
impl
shrshi Aug 13, 2024
27f1cb6
formatting
shrshi Aug 13, 2024
4987f74
added mixed type support
shrshi Aug 13, 2024
65e147f
formatting
shrshi Aug 13, 2024
32e8619
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into host-tre…
karthikeyann Sep 10, 2024
b30c43f
comments - unfinished
karthikeyann Sep 10, 2024
38819f2
very partial work; some comments
shrshi Sep 11, 2024
08cf338
struct column first try, basic tests pass
karthikeyann Sep 16, 2024
85983be
add support for array_of_arrays
karthikeyann Sep 16, 2024
e3fd1d5
fix vector of dtypes in struct json
karthikeyann Sep 17, 2024
dc25011
mixed type as string support added
karthikeyann Sep 18, 2024
d1ec9c7
forced nested type in mixed type data
karthikeyann Sep 18, 2024
ccfc6f6
style fixes
karthikeyann Sep 18, 2024
8fbb1d0
Merge branch 'branch-24.10' into host-tree-algorithms
karthikeyann Sep 18, 2024
ed0b354
cleanup
karthikeyann Sep 18, 2024
c3fcf8a
fix name for list child element as not element
karthikeyann Sep 18, 2024
a700865
reuse code
karthikeyann Sep 18, 2024
217c4d8
reorg code build_tree
karthikeyann Sep 19, 2024
7437653
pulled relevant changes from #16759
karthikeyann Sep 19, 2024
4eff9fc
code reorg: split to 3 functions
karthikeyann Sep 19, 2024
400df4b
split host functions to separate file
karthikeyann Sep 19, 2024
7f5fdf4
split new host algorithm to functions
karthikeyann Sep 19, 2024
10bddb8
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into enh-json…
karthikeyann Sep 19, 2024
3762477
move code
karthikeyann Sep 19, 2024
6c3b681
revert to old call
karthikeyann Sep 19, 2024
ac9fa76
prepare for merge with reorg
karthikeyann Sep 19, 2024
638cb24
Merge branch 'enh-json_code_reorg1' of github.com:karthikeyann/cudf i…
karthikeyann Sep 19, 2024
583c576
fix merge issue
karthikeyann Sep 19, 2024
62085a8
use experimental build_tree
karthikeyann Sep 19, 2024
eab13b3
same code for both make_device_json_column
karthikeyann Sep 19, 2024
1f855b5
add profiling
karthikeyann Sep 19, 2024
c68c259
fix for missmatched forced type left uninitialized
karthikeyann Sep 19, 2024
4efa820
unprune base list in array of arrays when prune is enabled
karthikeyann Sep 20, 2024
69459bd
Merge branch 'branch-24.10' into host-tree-algorithms
karthikeyann Sep 20, 2024
4917115
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into host-tre…
karthikeyann Sep 20, 2024
16f9acd
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into host-tre…
karthikeyann Sep 23, 2024
3694860
add experimental option for new host tree algorithm
karthikeyann Sep 23, 2024
5b1bdf4
remove debug prints
karthikeyann Sep 23, 2024
79364a9
cleanup comments
karthikeyann Sep 23, 2024
be30c60
address review comments
karthikeyann Sep 23, 2024
19f39c2
address review comments
karthikeyann Sep 24, 2024
28ce878
Merge branch 'branch-24.10' into host-tree-algorithms
karthikeyann Sep 24, 2024
5da21d5
Merge branch 'branch-24.10' of github.com:rapidsai/cudf into host-tre…
karthikeyann Sep 24, 2024
833960f
Merge branch 'host-tree-algorithms' of github.com:shrshi/cudf into ho…
karthikeyann Sep 24, 2024
6b501f3
Java JSON APIs experimental option
karthikeyann Sep 24, 2024
7ec6ba1
address review comments
karthikeyann Sep 24, 2024
6f8a4e2
utf8 field name support (experimental)
karthikeyann Sep 24, 2024
8e27ab3
style fixes
karthikeyann Sep 24, 2024
f3ccdfa
stream safety fixes
karthikeyann Sep 24, 2024
2c06379
add more nosync policy
karthikeyann Sep 24, 2024
e5f6d2a
address review comments
karthikeyann Sep 25, 2024
c02193d
Merge branch 'branch-24.10' into host-tree-algorithms
karthikeyann Sep 25, 2024
4dbbaa5
fix order of experimental option
karthikeyann Sep 25, 2024
ea373b6
Merge branch 'branch-24.10' into host-tree-algorithms
karthikeyann Sep 25, 2024
d1cf095
Merge branch 'branch-24.10' into host-tree-algorithms
karthikeyann Sep 25, 2024
0c65921
add missing experimental argument
karthikeyann Sep 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
stream safety fixes
  • Loading branch information
karthikeyann committed Sep 24, 2024
commit f3ccdfa93a3b000506fb48b87cae19867fe383af
8 changes: 5 additions & 3 deletions cpp/src/io/json/host_tree_algorithms.cu
Original file line number Diff line number Diff line change
Expand Up @@ -834,7 +834,7 @@ std::map<std::string, schema_element> unified_schema(cudf::io::json_reader_optio
options.get_dtypes());
}

std::pair<thrust::host_vector<uint8_t>, hashmap_of_device_columns> build_tree(
std::pair<cudf::detail::host_vector<uint8_t>, hashmap_of_device_columns> build_tree(
device_json_column& root,
host_span<uint8_t const> is_str_column_all_nulls,
tree_meta_t& d_column_tree,
Expand Down Expand Up @@ -957,7 +957,7 @@ void make_device_json_column(device_span<SymbolT const> input,
stream);
}

std::pair<thrust::host_vector<uint8_t>, hashmap_of_device_columns> build_tree(
std::pair<cudf::detail::host_vector<uint8_t>, hashmap_of_device_columns> build_tree(
device_json_column& root,
host_span<uint8_t const> is_str_column_all_nulls,
tree_meta_t& d_column_tree,
Expand All @@ -981,6 +981,7 @@ std::pair<thrust::host_vector<uint8_t>, hashmap_of_device_columns> build_tree(
cudf::detail::make_host_vector_async(d_column_tree.node_range_begin, stream);
auto const max_row_offsets = cudf::detail::make_host_vector_async(d_max_row_offsets, stream);
auto num_columns = d_unique_col_ids.size();
stream.synchronize();

auto to_json_col_type = [](auto category) {
switch (category) {
Expand Down Expand Up @@ -1030,7 +1031,8 @@ std::pair<thrust::host_vector<uint8_t>, hashmap_of_device_columns> build_tree(
}

// Pruning
thrust::host_vector<uint8_t> is_pruned(num_columns, options.is_enabled_prune_columns());
auto is_pruned = cudf::detail::make_host_vector<uint8_t>(num_columns, stream);
std::fill_n(is_pruned.begin(), num_columns, options.is_enabled_prune_columns());

// prune all children of a column, but not self.
auto ignore_all_children = [&](auto parent_col_id) {
vuule marked this conversation as resolved.
Show resolved Hide resolved
Expand Down
2 changes: 1 addition & 1 deletion cpp/src/io/json/json_tree.cu
Original file line number Diff line number Diff line change
Expand Up @@ -664,7 +664,7 @@ rmm::device_uvector<size_type> hash_node_type_with_field_name(device_span<Symbol
};
if (!is_enabled_experimental) { return std::pair{false, make_map(0)}; }
// get all unique field node ids for utf8 decoding
auto num_keys = key_set.size();
auto num_keys = key_set.size(stream);
rmm::device_uvector<size_type> keys(num_keys, stream);
key_set.retrieve_all(keys.data(), stream.value());

Expand Down