-
Notifications
You must be signed in to change notification settings - Fork 924
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Support reading matching projected and filter cols from Parquet files…
… with otherwise mismatched schemas (#16394) Closes #16269. This PR adds support to read (matching) projected/selected and filter columns from Parquet files with otherwise mismatching schemas. ### Solution Description We create a `std::vector<unordered_maps<int32_t, int32_t>>`, one per file except 0th file. We then co-walk schema trees and populate the map with corresponding (one-to-one mapped) `schema_idx` of valid selected (projection and filter) column between 0th and the rest of the files. The same `unordered_map` is used to get the `schema_idx` of the same columns across files when creating `ColumnChunkDesc` and copying column chunk metadata into the page decoder. ### Known Limitation - [x] Nullability across files: Each selected column must still be either nullable or non-nullable across all files. See #12702 also described in [#dask/9935](dask/dask#9935) CC @wence- Authors: - Muhammad Haseeb (https://github.com/mhaseeb123) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Lawrence Mitchell (https://github.com/wence-) - Vukasin Milovanovic (https://github.com/vuule) URL: #16394
- Loading branch information
1 parent
925530a
commit fbd6114
Showing
11 changed files
with
534 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.