-
Notifications
You must be signed in to change notification settings - Fork 917
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Bind
read_parquet_metadata
API to libcudf instead of pyarrow and ex…
…tract `RowGroup` information (#15398) The `cudf.io.read_parquet_metadata` is now bound to corresponding libcudf API instead of relying on pyarrow. The libcudf API now also returns high level `RowGroup` metadata to solve #11214. Added additional tests and doc updates as well. More metadata information such `min, max` values for each column in each row group can also be extracted and returned if needed. Thoughts? Recommend: Closing #15320 without merging in favor of this PR. Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) - GALI PREM SAGAR (https://github.com/galipremsagar) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Vukasin Milovanovic (https://github.com/vuule) - Bradley Dice (https://github.com/bdice) URL: #15398
- Loading branch information
1 parent
9f2fdf8
commit f222b4a
Showing
9 changed files
with
249 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Copyright (c) 2024, NVIDIA CORPORATION. | ||
|
||
from libc.stdint cimport int64_t | ||
from libcpp.string cimport string | ||
from libcpp.unordered_map cimport unordered_map | ||
from libcpp.vector cimport vector | ||
|
||
cimport cudf._lib.cpp.io.types as cudf_io_types | ||
from cudf._lib.cpp.types cimport size_type | ||
|
||
|
||
cdef extern from "cudf/io/parquet_metadata.hpp" namespace "cudf::io" nogil: | ||
cdef cppclass parquet_column_schema: | ||
parquet_column_schema() except+ | ||
string name() except+ | ||
size_type num_children() except+ | ||
parquet_column_schema child(int idx) except+ | ||
vector[parquet_column_schema] children() except+ | ||
|
||
cdef cppclass parquet_schema: | ||
parquet_schema() except+ | ||
parquet_column_schema root() except+ | ||
|
||
cdef cppclass parquet_metadata: | ||
parquet_metadata() except+ | ||
parquet_schema schema() except+ | ||
int64_t num_rows() except+ | ||
size_type num_rowgroups() except+ | ||
unordered_map[string, string] metadata() except+ | ||
vector[unordered_map[string, int64_t]] rowgroup_metadata() except+ | ||
|
||
cdef parquet_metadata read_parquet_metadata(cudf_io_types.source_info src) except+ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.