Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[parquet] feature gate some functionality #4765

Closed
wants to merge 1 commit into from

Conversation

ritchie46
Copy link
Contributor

@ritchie46 ritchie46 commented Sep 3, 2023

Which issue does this PR close?

Closes #4764

Rationale for this change

I did some --timings --release builds on #4765

master feature perc
58.54 42.12 73%
42.6 33.8 79%
59.9 38.6 56%

What changes are included in this PR?

Adds some feature gates. I also want to feature gate arrow-select if you agree with this path. This PR is opened to start a guided discussion on this.

Are there any user-facing changes?

No, we keep default features the same.

@github-actions github-actions bot added parquet Changes to the parquet crate arrow Changes to the arrow crate labels Sep 3, 2023
@ritchie46 ritchie46 changed the title feature out ipc::{reader/writer} [parquet] feature gate some functionality Sep 3, 2023
@tustvold
Copy link
Contributor

tustvold commented Sep 4, 2023

The feature gates don't appear to be being used at all? I'm therefore rather confused how this is helping compile times...

@@ -42,7 +42,7 @@ arrow-csv = { workspace = true, optional = true }
arrow-data = { workspace = true, optional = true }
arrow-schema = { workspace = true, optional = true }
arrow-select = { workspace = true, optional = true }
arrow-ipc = { workspace = true, optional = true }
arrow-ipc = { workspace = true, default-features = false, optional = true }
Copy link
Contributor

@tustvold tustvold Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will result in recompilation when switching crates in this workspace, something I'd really like to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean during development time? I don't really understand. Seems a single compilation pass per crate. Recompiling both afterwards is cached.

[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs$ cd arrow-ipc/                    (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cargo c                (base) 
    Checking libm v0.2.7
    Checking cfg-if v1.0.0
   Compiling libc v0.2.147
    Checking static_assertions v1.1.0
    Checking bytes v1.4.0
    Checking arrow-schema v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-schema)
    Checking iana-time-zone v0.1.57
    Checking once_cell v1.18.0
    Checking hashbrown v0.14.0
    Checking bitflags v1.3.2
    Checking lexical-util v0.8.5
    Checking flatbuffers v23.5.26
    Checking num-traits v0.2.16
    Checking lexical-write-integer v0.8.5
    Checking lexical-parse-integer v0.8.6
    Checking lexical-write-float v0.8.5
    Checking lexical-parse-float v0.8.5
    Checking getrandom v0.2.10
    Checking num-integer v0.1.45
    Checking num-complex v0.4.4
    Checking half v2.3.1
    Checking chrono v0.4.28
    Checking ahash v0.8.3
    Checking num-bigint v0.4.4
    Checking num-iter v0.1.43
    Checking lexical-core v0.8.5
    Checking num-rational v0.4.1
    Checking num v0.4.1
    Checking arrow-buffer v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-buffer)
    Checking arrow-data v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-data)
    Checking arrow-array v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-array)
    Checking arrow-select v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-select)
    Checking arrow-cast v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-cast)
    Checking arrow-ipc v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-ipc)
    Finished dev [unoptimized + debuginfo] target(s) in 13.10s
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cd ..                  (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs$ cd parquet                       (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/parquet$ cargo c                  (base) 
    Checking libc v0.2.147
    Checking alloc-no-stdlib v2.0.4
    Checking adler v1.0.2
    Checking byteorder v1.4.3
    Checking integer-encoding v3.0.4
    Checking twox-hash v1.6.3
    Checking base64 v0.21.3
    Checking crc32fast v1.3.2
    Checking chrono v0.4.28
    Checking ordered-float v2.10.0
    Checking snap v1.1.0
    Checking alloc-stdlib v0.2.2
    Checking miniz_oxide v0.7.1
    Checking brotli-decompressor v2.3.4
    Checking thrift v0.17.0
    Checking getrandom v0.2.10
    Checking zstd-sys v2.0.8+zstd.1.5.5
    Checking lz4-sys v1.9.4
    Checking flate2 v1.0.27
    Checking lz4 v1.24.0
    Checking ahash v0.8.3
    Checking zstd-safe v6.0.6
    Checking zstd v0.12.4
    Checking arrow-array v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-array)
    Checking brotli v3.3.4
    Checking arrow-select v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-select)
    Checking arrow-cast v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-cast)
    Checking arrow-ipc v46.0.0 (/home/ritchie46/code/arrow-rs/arrow-ipc)
    Checking parquet v46.0.0 (/home/ritchie46/code/arrow-rs/parquet)
    Finished dev [unoptimized + debuginfo] target(s) in 9.00s
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/parquet$ cd ..                    (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs$ cd arrow-ipc/                    (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cargo c                (base) 
    Finished dev [unoptimized + debuginfo] target(s) in 0.22s
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/arrow-ipc$ cd ..                  (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs$ arrow/^C                         (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs$ cd parquet                       (base) 
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/parquet$ cargo c                  (base) 
    Finished dev [unoptimized + debuginfo] target(s) in 0.28s
[features_parquet]  ritchie46 /home/ritchie46/code/arrow-rs/parquet$      

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you switch to the parquet crate it shouldn't need to recompile all the arrow-ipc gubbins, this gets really annoying if say your working on arrow-select and switching between crates to integration test things. It isn't the end of the world, but it very quickly gets out of hand. The DataFusion codebase is extremely painful to work on as a result of this - apache/datafusion#7370

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.. So the ideal case would be having an IPC-schema crate? Not saying that we should, but more a philosophical question. That would shorten compile times for both consumers and devs?

Copy link
Contributor

@tustvold tustvold Sep 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the numbers justify it... I am rather confused though as to how the readers would materially contribute to compile times, something is funky here...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I don't understand it either atm. Will take a better look. Posted the chart in the original issue.

@ritchie46
Copy link
Contributor Author

ritchie46 commented Sep 4, 2023

The feature gates don't appear to be being used at all?

How do you mean? The default features are deactivated, meaning that the IPC reader/writer are not included in compilation.

@tustvold
Copy link
Contributor

tustvold commented Sep 4, 2023

meaning that the IPC reader/writer are not included in compilation

How? I might be being dense, but I'm not seeing any #[cfg(feature = "...")] in this PR...

@ritchie46
Copy link
Contributor Author

How? I might be being dense, but I'm not seeing any #[cfg(feature = "...")] in this PR...

Whoops. You're not dense. 😅 Some local changes here.

@ritchie46
Copy link
Contributor Author

Ok, will close this one and open something more solid a bit later, once I understand the timings and bottlenecks better, and hopefully can create features that don't force re-compilation for devs here.

@ritchie46 ritchie46 closed this Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[parquet]: feature gate functionality in parquet.
2 participants