forked from apache/arrow-rs
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Patched 40.0.0
with Parquet memory limiting40
#37
Open
alamb
wants to merge
9
commits into
alamb/40.0.0_base
Choose a base branch
from
alamb/40.0.0_patched
base: alamb/40.0.0_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Add splice column API (apache#4155) * Review feedback * Re-encode offset index
…e#4278) * Add `Debug` impls for writers * Improve display
* feat(api make ArrowArrayStreamReader Send * simplify ptr handling * rename pyarrow traits to conform to guidelines * pr feedback * remove dangling Box::from_raw
* Derive Default for WriterProperties * Review feedback
* Initial implementation for writing fixed-size lists to Parquet. The implementation still needs tests. The implementation uses a new `write_fixed_size_list` method instead of `write_list`. This is done to avoid the overhead of needlessly calculating list offsets. * Initial implementation for reading fixed-size lists from Parquet. The implementation still needs tests. * Added tests for fixed-size list writer. Fixed bugs in implementation found via tests. * Added tests for fixed-size list reader. Fixed bugs in implementation found via tests. * Added correct behavior for writing empty fixed-length lists. Writer now emits the correct definition levels for empty lists. Added empty list unit test. * Added correct behavior for reading empty fixed-length lists. Reader now handles empty list definition levels correctly. Added empty list unit test. * Fixed linter warnings. * Added license header to fixed_size_list_array.rs * Added fixed-size list reader tests from PR review. * Added fixed-size reader row length sanity checks. * Simplified fixed-size list case in LevelInfoBuilder constructor. * Removed dynamic dispatch inside fixed-length list writer. * Expanded list of structs test for fixed-size list writer. * Reverted expected levels in fixed-size list writer test. * Fixed linter warnings. * Updated list size check in fixed-size list reader. Converted the check to return an error instead of panicking. * Small tweak to row length check in fixed-size list reader. * Fixed bug in fixed-size list level encoding. Writer now correctly handles child arrays with variable row length. Added new unit test to verify the new behavior is correct. * Added fixed-size list reader test. Test verifies that reader handles child arrays with variable length correctly.
…ad of RecordBatch (apache#3871) (apache#4280) * Buffer Pages in ArrowWriter instead of RecordBatch (apache#3871) * Review feedback * Improved memory accounting * Clippy
alamb
pushed a commit
that referenced
this pull request
Sep 23, 2024
* chore: add docs, part of #37 - add pragma `#![warn(missing_docs)]` to `arrow`, `arrow-arith`, `arrow-avro` - add docs to the same to remove lint warnings * chore: add docs, part of #37 - add pragma `#![warn(missing_docs)]` to `arrow-buffer`, `arrow-cast`, `arrow-csv` - add docs to the same to remove lint warnings * chore: update docs, resolve PR comments
alamb
pushed a commit
that referenced
this pull request
Sep 24, 2024
* chore: add docs, part of #37 - add pragma `#![warn(missing_docs)]` to the following - `arrow-array` - `arrow-cast` - `arrow-csv` - `arrow-data` - `arrow-json` - `arrow-ord` - `arrow-pyarrow-integration-testing` - `arrow-row` - `arrow-schema` - `arrow-select` - `arrow-string` - `arrow` - `parquet_derive` - add docs to those that generated lint warnings - Remove `bitflags` workaround in `arrow-schema` At some point, a change in `bitflags v2.3.0` had started generating lint warnings in `arrow-schema`, This was handled using a [workaround](apache#4233) [Issue](bitflags/bitflags#356) `bitflags v2.3.1` fixed the issue hence the workaround is no longer needed. * fix: resolve comments on PR apache#6433
alamb
pushed a commit
that referenced
this pull request
Oct 1, 2024
* chore: add docs, part of #37 - add pragma `#![warn(missing_docs)]` to the following - `arrow-flight` - `arrow-ipc` - `arrow-integration-test` - `arrow-integration-testing` - `object_store` - also document the caveat with using level 10 GZIP compression in parquet. See apache#6282. * chore: resolve PR comments from apache#6453
alamb
pushed a commit
that referenced
this pull request
Oct 2, 2024
- add pragma `#![warn(missing_docs)]` to `parquet` This is the final component in the effort to make Arrow fully-documented. The entire project now generates warning for missing docs, if any. - `arrow-flight`: replace `tonic`'s deprecated `compile_with_config` with suggested method - new deprecation: The following types were not used anywhere and were possibly strays. They've been marked as deprecated and will be removed in future versions. - `parquet::data_types::SliceAsBytesDataType` - `parquet::column::writer::Level`
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR contains a patched version of 40.0.0 that backports the fix for apache#3871 and other related parquet changes so that we can use it in IOx - https://github.com/influxdata/influxdb_iox/pull/7880
It starts with the parquet
40.0.0
release and cherry-picks the following commits. Allgit cherry-pick
s applied cleanly ( I didn't need to resolve any conflicts)3adca53 -- metadata
58e2c1c -- splice column
17ca4d5 - Debug Impls
56437cc - default for writer props
aa799f0 - Send
3e5b07a - more send
6959b4b - metrics
741244d - Fixed size support
ea00892 - Memory Accounting