-
Notifications
You must be signed in to change notification settings - Fork 817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[parquet] feature gate some functionality #4765
Conversation
The feature gates don't appear to be being used at all? I'm therefore rather confused how this is helping compile times... |
@@ -42,7 +42,7 @@ arrow-csv = { workspace = true, optional = true } | |||
arrow-data = { workspace = true, optional = true } | |||
arrow-schema = { workspace = true, optional = true } | |||
arrow-select = { workspace = true, optional = true } | |||
arrow-ipc = { workspace = true, optional = true } | |||
arrow-ipc = { workspace = true, default-features = false, optional = true } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will result in recompilation when switching crates in this workspace, something I'd really like to avoid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean during development time? I don't really understand. Seems a single compilation pass per crate. Recompiling both afterwards is cached.
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs$ cd arrow-ipc/ (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cargo c (base)
Checking libm v0.2.7
Checking cfg-if v1.0.0
Compiling libc v0.2.147
Checking static_assertions v1.1.0
Checking bytes v1.4.0
Checking arrow-schema v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-schema)
Checking iana-time-zone v0.1.57
Checking once_cell v1.18.0
Checking hashbrown v0.14.0
Checking bitflags v1.3.2
Checking lexical-util v0.8.5
Checking flatbuffers v23.5.26
Checking num-traits v0.2.16
Checking lexical-write-integer v0.8.5
Checking lexical-parse-integer v0.8.6
Checking lexical-write-float v0.8.5
Checking lexical-parse-float v0.8.5
Checking getrandom v0.2.10
Checking num-integer v0.1.45
Checking num-complex v0.4.4
Checking half v2.3.1
Checking chrono v0.4.28
Checking ahash v0.8.3
Checking num-bigint v0.4.4
Checking num-iter v0.1.43
Checking lexical-core v0.8.5
Checking num-rational v0.4.1
Checking num v0.4.1
Checking arrow-buffer v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-buffer)
Checking arrow-data v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-data)
Checking arrow-array v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-array)
Checking arrow-select v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-select)
Checking arrow-cast v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-cast)
Checking arrow-ipc v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-ipc)
Finished dev [unoptimized + debuginfo] target(s) in 13.10s
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cd .. (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs$ cd parquet (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/parquet$ cargo c (base)
Checking libc v0.2.147
Checking alloc-no-stdlib v2.0.4
Checking adler v1.0.2
Checking byteorder v1.4.3
Checking integer-encoding v3.0.4
Checking twox-hash v1.6.3
Checking base64 v0.21.3
Checking crc32fast v1.3.2
Checking chrono v0.4.28
Checking ordered-float v2.10.0
Checking snap v1.1.0
Checking alloc-stdlib v0.2.2
Checking miniz_oxide v0.7.1
Checking brotli-decompressor v2.3.4
Checking thrift v0.17.0
Checking getrandom v0.2.10
Checking zstd-sys v2.0.8+zstd.1.5.5
Checking lz4-sys v1.9.4
Checking flate2 v1.0.27
Checking lz4 v1.24.0
Checking ahash v0.8.3
Checking zstd-safe v6.0.6
Checking zstd v0.12.4
Checking arrow-array v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-array)
Checking brotli v3.3.4
Checking arrow-select v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-select)
Checking arrow-cast v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-cast)
Checking arrow-ipc v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-ipc)
Checking parquet v46.0.0 (/home/ritchie46/code/arrow-rs/parquet)
Finished dev [unoptimized + debuginfo] target(s) in 9.00s
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/parquet$ cd .. (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs$ cd arrow-ipc/ (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cargo c (base)
Finished dev [unoptimized + debuginfo] target(s) in 0.22s
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cd .. (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs$ arrow/^C (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs$ cd parquet (base)
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/parquet$ cargo c (base)
Finished dev [unoptimized + debuginfo] target(s) in 0.28s
[features_parquet] ritchie46 /home/ritchie46/code/arrow-rs/parquet$
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you switch to the parquet crate it shouldn't need to recompile all the arrow-ipc gubbins, this gets really annoying if say your working on arrow-select and switching between crates to integration test things. It isn't the end of the world, but it very quickly gets out of hand. The DataFusion codebase is extremely painful to work on as a result of this - apache/datafusion#7370
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right.. So the ideal case would be having an IPC-schema
crate? Not saying that we should, but more a philosophical question. That would shorten compile times for both consumers and devs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the numbers justify it... I am rather confused though as to how the readers would materially contribute to compile times, something is funky here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I don't understand it either atm. Will take a better look. Posted the chart in the original issue.
How do you mean? The default features are deactivated, meaning that the IPC reader/writer are not included in compilation. |
How? I might be being dense, but I'm not seeing any |
Whoops. You're not dense. 😅 Some local changes here. |
Ok, will close this one and open something more solid a bit later, once I understand the timings and bottlenecks better, and hopefully can create features that don't force re-compilation for devs here. |
Which issue does this PR close?
Closes #4764
Rationale for this change
I did some
--timings --release
builds on #4765What changes are included in this PR?
Adds some feature gates. I also want to feature gate
arrow-select
if you agree with this path. This PR is opened to start a guided discussion on this.Are there any user-facing changes?
No, we keep default features the same.