Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace usage of lexical-core crate with libcore for parsing strings as floats #4785

Closed
wants to merge 5 commits into from

Conversation

Weijun-H
Copy link
Member

@Weijun-H Weijun-H commented Sep 7, 2023

Which issue does this PR close?

Closes #4774

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 7, 2023
lexical_core::parse::<f32>(s).ok().map(f16::from_f32)
s.iter()
.map(|&b| b as char)
.collect::<String>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the performance hit of this allocation will be a non-starter. This could be reduced by using str::from_utf8, but this will still likely regress performance

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this file is json specific it might also be possible to use serde_json::from_slice(s).ok(), but it seems this has slightly stricter parsing and does not accept numbers such as 2e0 as integer (but does as float).

Copy link
Contributor

@tustvold tustvold Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The great irony there is serde_json is just using a vended version of lexical_core 😅 but it might be a good way to proceed nonetheless

@tustvold
Copy link
Contributor

Have you run the benchmarks for this change?

@Weijun-H
Copy link
Member Author

Weijun-H commented Sep 11, 2023

Hi @tustvold , I ran cargo +nightly bench --bench cast_kernels .

cast int32 to int32 512 time:   [204.00 ns 206.14 ns 208.33 ns]
                        change: [-5.5029% -3.9430% -2.4182%] (p = 0.00 < 0.05)
                        Performance has improved.

cast int32 to uint32 512
                        time:   [2.0384 µs 2.0466 µs 2.0567 µs]
                        change: [-0.1984% +0.2765% +0.8386%] (p = 0.33 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

cast int32 to float32 512
                        time:   [2.3875 µs 2.3941 µs 2.4022 µs]
                        change: [-1.8787% -0.0544% +1.0504%] (p = 0.95 > 0.05)
                        No change in performance detected.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

cast int32 to float64 512
                        time:   [2.3832 µs 2.3848 µs 2.3867 µs]
                        change: [-2.7788% -1.7142% -0.9877%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast int32 to int64 512 time:   [2.3105 µs 2.3152 µs 2.3200 µs]
                        change: [-2.6993% -1.9542% -1.3761%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast float32 to int32 512
                        time:   [2.3297 µs 2.3345 µs 2.3397 µs]
                        change: [+0.9515% +1.1934% +1.4487%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

cast float64 to float32 512
                        time:   [2.3568 µs 2.3623 µs 2.3675 µs]
                        change: [-0.8033% -0.4485% -0.1454%] (p = 0.01 < 0.05)
                        Change within noise threshold.

cast float64 to uint64 512
                        time:   [2.3810 µs 2.3882 µs 2.3973 µs]
                        change: [+0.7436% +0.9361% +1.1451%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

cast int64 to int32 512 time:   [2.4393 µs 2.4450 µs 2.4507 µs]
                        change: [-0.3976% -0.1747% +0.0455%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  7 (7.00%) low mild
  4 (4.00%) high mild

cast date64 to date32 512
                        time:   [411.16 ns 411.63 ns 412.23 ns]
                        change: [-1.4922% -1.1806% -0.8669%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast date32 to date64 512
                        time:   [391.89 ns 392.68 ns 393.55 ns]
                        change: [-3.2120% -2.7597% -2.3170%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

cast time32s to time32ms 512
                        time:   [182.21 ns 182.55 ns 182.92 ns]
                        change: [-8.9820% -8.4613% -7.9270%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) high mild
  11 (11.00%) high severe

cast time32s to time64us 512
                        time:   [398.80 ns 400.65 ns 402.86 ns]
                        change: [-3.3939% -2.9998% -2.5725%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

cast time64ns to time32s 512
                        time:   [411.41 ns 411.98 ns 412.82 ns]
                        change: [-5.1090% -4.2686% -3.4874%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe

cast timestamp_ns to timestamp_s 512
                        time:   [204.24 ns 205.38 ns 206.51 ns]
                        change: [-1.9684% -1.4627% -0.9635%] (p = 0.00 < 0.05)
                        Change within noise threshold.

cast timestamp_ms to timestamp_ns 512
                        time:   [3.0508 µs 3.0544 µs 3.0581 µs]
                        change: [-1.0488% -0.7396% -0.4990%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast utf8 to f32        time:   [8.8845 µs 8.8974 µs 8.9138 µs]
                        change: [-21.302% -20.204% -19.257%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe

cast i64 to string 512  time:   [15.687 µs 15.720 µs 15.767 µs]
                        change: [-2.4930% -1.7614% -1.2380%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

cast f32 to string 512  time:   [21.984 µs 22.017 µs 22.051 µs]
                        change: [+0.0279% +0.1690% +0.3156%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) low mild

cast timestamp_ms to i64 512
                        time:   [219.23 ns 220.14 ns 221.24 ns]
                        change: [-10.034% -9.0068% -8.0400%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast utf8 to date32 512 time:   [9.3471 µs 9.3546 µs 9.3651 µs]
                        change: [+0.1587% +0.4162% +0.6211%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) high mild
  6 (6.00%) high severe

cast utf8 to date64 512 time:   [63.588 µs 63.866 µs 64.134 µs]
                        change: [+6.2335% +6.7701% +7.3132%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild

cast decimal128 to decimal128 512
                        time:   [4.0842 µs 4.0933 µs 4.1036 µs]
                        change: [-0.7165% -0.5197% -0.3283%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

cast decimal128 to decimal256 512
                        time:   [6.1770 µs 6.1875 µs 6.2019 µs]
                        change: [-0.4030% -0.1438% +0.1200%] (p = 0.30 > 0.05)
                        No change in performance detected.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe

cast decimal256 to decimal128 512
                        time:   [37.884 µs 37.939 µs 37.998 µs]
                        change: [+0.8098% +1.1873% +1.4663%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe

cast decimal256 to decimal256 512
                        time:   [6.0214 µs 6.0282 µs 6.0352 µs]
                        change: [-0.3859% -0.0314% +0.2086%] (p = 0.88 > 0.05)
                        No change in performance detected.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

cast decimal128 to decimal128 512 with same scale
                        time:   [75.928 ns 75.985 ns 76.050 ns]
                        change: [-0.7296% -0.5711% -0.3980%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

cast decimal256 to decimal256 512 with same scale
                        time:   [75.878 ns 75.969 ns 76.121 ns]
                        change: [-2.4126% -1.8729% -1.3256%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

@tustvold
Copy link
Contributor

The JSON benchmarks are the ones I'm potentially concerned about

@Weijun-H
Copy link
Member Author

The JSON benchmarks are the ones I'm potentially concerned about

The JSON performance has decreased.

small_bench_primitive   time:   [9.3441 µs 9.3692 µs 9.4011 µs]
                        change: [+13.467% +13.828% +14.207%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe

large_bench_primitive   time:   [3.0556 ms 3.0708 ms 3.0883 ms]
                        change: [+27.529% +28.217% +28.843%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  3 (3.00%) high mild
  14 (14.00%) high severe

small_bench_list        time:   [17.681 µs 17.707 µs 17.732 µs]
                        change: [+11.293% +11.625% +11.948%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

@tustvold
Copy link
Contributor

I think a 30% performance regression is a pretty tough sell, especially when it isn't as if lexical_core has security vulnerabilities that impact us... Have you tried the trick of using serde_json to provide the vended lexical_core?

@Weijun-H
Copy link
Member Author

I think a 30% performance regression is a pretty tough sell, especially when it isn't as if lexical_core has security vulnerabilities that impact us... Have you tried the trick of using serde_json to provide the vended lexical_core?

The performance of serde_json also decreased.

small_bench_primitive   time:   [7.6051 µs 7.6808 µs 7.7639 µs]
                        change: [+20.123% +20.691% +21.386%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

Benchmarking large_bench_primitive: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.7s, enable flat sampling, or reduce sample count to 50.
large_bench_primitive   time:   [1.9096 ms 1.9123 ms 1.9155 ms]
                        change: [+4.5542% +4.8926% +5.1944%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

small_bench_list        time:   [13.844 µs 13.870 µs 13.896 µs]
                        change: [+14.486% +15.117% +15.654%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe

@tustvold
Copy link
Contributor

Is that using serde_json::from_slice or serde_json::from_str?

@Weijun-H
Copy link
Member Author

Is that using serde_json::from_slice or serde_json::from_str?

I used serde_json::from_slice

@Weijun-H
Copy link
Member Author

Hi @tustvold, since JSON performance is poor, I will keep the change for arrow-cast, but not for arrow-json. What is your opinion?

Err(_) => lexical_core::parse::<f64>(s).ok().and_then(NumCast::from),
match std::str::from_utf8(s) {
Ok(s) => {
if let Ok(res) = s.to_string().parse::<$t>() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to_string shouldn't be needed?

@Weijun-H
Copy link
Member Author

Thanks to the suggestion from @Dandandan, the performance is now almost on par with that of the lexical-core system.

small_bench_primitive   time:   [8.3978 µs 8.4857 µs 8.6162 µs]
                        change: [-5.9124% -2.8511% -0.5427%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

large_bench_primitive   time:   [2.6429 ms 2.6958 ms 2.7769 ms]
                        change: [+8.5944% +10.894% +14.284%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe

small_bench_list        time:   [16.202 µs 16.654 µs 17.206 µs]
                        change: [-3.7706% +0.2623% +3.7338%] (p = 0.91 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high severe

@tustvold
Copy link
Contributor

I'm honestly torn by this, it doesn't eliminate the lexical core dependency and represents a non-trivial performance regression...

I can add it to my list to investigate if there is some way to get the same performance here

@tustvold
Copy link
Contributor

In fact I'm not sure we even need to do this, the advisory is not for lexical-core - #4774 (comment)

@tustvold
Copy link
Contributor

Given this does still regress performance I am going to close this for now, feel free to reopen if I have misunderstood

@tustvold tustvold closed this Sep 25, 2023
@tustvold tustvold mentioned this pull request Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace lexical
4 participants