Add niche optimization for `NaiveTime` #811

Kijewski · 2022-09-06T08:50:08Z

NaiveTime contains two integer fields, both with a restricted range:

secs, the second of the day, is less than 86,400.
frac is used for two things. To store the nanosecond of the second,
and a leap second, so its highest value is 1,999,999,999.

So, in order to allow niche optimization, one or both values could be
implemented in a type that does not allow higher values. Because frac
is stored later in the struct, it has the better option to yield
optimized code, e.g. when used with Result<NaiveTime, E>. The compiler
has a longer consecutive sequence of bytes to store non-NaiveTime its
data.

djc

Thank you for working on this, interesting approach!

So, in order to allow niche optimization, one or both values could be implemented in a type that does not allow higher values. Because frac is stored later in the struct, it has the better option to yield optimized code, e.g. when used with Result<NaiveTime, E>. The compiler has a longer consecutive sequence of bytes to store non-NaiveTime its data.

If we're going to do this, why not take the 15-bit niche from secs instead of 1 bit from frac? How have you tested this? Do you have an actual use case where this optimization has a substantial impact?

djc · 2022-09-07T10:30:06Z

src/naive/time/u31.rs

+#[cfg(target_endian = "little")]
+#[derive(Copy, Clone)]
+struct Buf {
+    align: [u32; 0],


Is this trick documented anywhere? I'd like this to be commented with a (more or less) authoritative citation on how this behaves.

Slices have the same layout as the section of the array they slice. -- Type layout # Slice layout

This is the same for 0 sized slices.

For information, you can also use #[repr(align(4))] to do the same thing (from RFC 1358).

For some platforms the alignment of u32 might be 16bits. msp430-none-elf is a 16bit, tier 3 platform.

djc · 2022-09-07T10:31:00Z

src/naive/time/u31.rs

+
+/// A 31-bit unsigned integer
+#[allow(unreachable_pub)] // public through `rkyv::Archive<NaiveTime>`
+#[repr(transparent)]


I suppose this is necessary in order for the compiler to leverage the niche? What does it mean for semver compatibility going forward?

It's not a public type, so the user should not need to care how U31 is implemented. Unless a user does very odd stuff like accessing an Option<NaiveTime> in C, they should not be bothered by this change.

Actually, I think I don't need struct U31(Buf) anyhow. I could just let U31 be the endianness-guarded struct directly.

djc · 2022-09-07T10:31:25Z

src/naive/time/mod.rs

@@ -551,6 +556,7 @@ impl NaiveTime {
            } else {
                frac = (i64::from(frac) + rhs.num_nanoseconds().unwrap()) as u32;
                debug_assert!(frac < 2_000_000_000);
+                let frac = unsafe { U31::new_unchecked(frac) };


I'd like some comments here that clearly document how/where the invariant is upheld in release builds.

I assumed chrono is well enough tested, so I can assume the invariant to hold.

If I read the code correctly it is tested that frac < 1_000_000_000 and rhs.num_nanoseconds() cannot return a value >= 1e9. Is the second assumption true? Then I this as a // SAFETY comment.

djc · 2022-09-07T10:36:00Z

Also, why not go with the ux crate or the arbitrary-int crate?

(IMO the awesome version of this is a crate that we can use at test time to generate code into our src directory specifically for the types that we need, without adding a compile-time dependency for chrono. This would allow the upstream maintainers to improve the code over time without us having to take responsibility for a particular implementation.)

Kijewski · 2022-09-07T10:44:30Z

Also, why not go with the ux crate or the arbitrary-int crate?

Neither crate uses niche optimization. I have an open PR for ux to implement it there: rust-ux/uX#49.

My system is far from crappy, but it takes me 17 seconds to compile ux in release mode. I don't think this additional compile time would appropriate to add to chrono.

Kijewski · 2022-09-07T10:57:04Z

If we're going to do this, why not take the 15-bit niche from secs instead of 1 bit from frac?

In my current implementation, if the last bit of the struct is set, then the whole thing it not a NaiveTime. So sec can be every value anyway.

NaiveTime is used as the last member in NaiveDateTime, and DateTime, so it better to have the one bit at the very end of the structs, then somewhere in the middle. Otherwise you make the usable byte sequence a little bit smaller.

`NaiveTime` contains two integer fields, both with a restricted range: - `secs`, the second of the day, is less than 86,400. - `frac` is used for two things. To store the nanosecond of the second, and a leap second, so its highest valus is 1,999,999,999. So, in order to allow niche optimization, one or both values could be implemented in a type that does not allow higher values. Because `frac` is stored later in the struct, it has the better option to yield optimized code, e.g. when used with `Result<NaiveTime, E>`. The compiler has a longer consecutive sequence of bytes to store non-NaiveTime its data.

djc · 2022-09-07T11:09:18Z

My system is far from crappy, but it takes me 17 seconds to compile ux in release mode. I don't think this additional compile time would appropriate to add to chrono.

Fair point!

If we're going to do this, why not take the 15-bit niche from secs instead of 1 bit from frac?

In my current implementation, if the last bit of the struct is set, then the whole thing it not a NaiveTime. So sec can be every value anyway.

NaiveTime is used as the last member in NaiveDateTime, and DateTime, so it better to have the one bit at the very end of the structs, then somewhere in the middle. Otherwise you make the usable byte sequence a little bit smaller.

The compiler is allowed to reorder fields at will, right? Why wouldn't it do so here? And if not, we can trivially reorder secs and frac within NaiveTime.

Kijewski · 2022-09-07T11:13:47Z

The compiler is allowed to reorder fields at will, right? Why wouldn't it do so here? And if not, we can trivially reorder secs and frac within NaiveTime.

I don't think about that. I don't know if the compiler reorders fields to make use of niches. I only know that it will reorder structs to minimize padding because of alignment issues.

It would be actually kind of trivial to copy the file and have a U17 type, also. But that would probably be overkill. :)

esheppa · 2022-09-07T12:00:28Z

Sorry to jump in kind of late here, but could we use something like: nonmax to hold the nanos?

Kijewski · 2022-09-07T12:05:38Z

nonmax generates terrible byte code, because it has to negate the value on every access. Most likely twice: once for reading, once for writing. This is enough to make the optimizer not understand what is going on.

Otherwise, how often will you access the data? It is obviously great that nonmax does not need unsafe operations to access the data. Nonmax could be "good enough".

esheppa · 2022-09-07T12:09:15Z

Is it worth us benchmarking both methods so that we have some empirical evidence to inform the direction we choose?

djc · 2022-09-07T12:23:59Z

In general, these questions haven't been answered yet:

How have you tested this? Do you have an actual use case where this optimization has a substantial impact?

Kijewski · 2022-09-07T12:31:04Z

Is it worth us benchmarking both methods so that we have some empirical evidence to inform the direction we choose?

Old values use nonmax, new values use U31:

bench_datetime_parse_from_rfc2822                                                                            
                        time:   [196.87 ns 197.23 ns 197.60 ns]
                        change: [+5.3629% +5.6315% +5.8895%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench_datetime_parse_from_rfc3339                                                                            
                        time:   [163.58 ns 164.23 ns 164.98 ns]
                        change: [+6.6111% +7.8651% +9.1036%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench_datetime_from_str time:   [245.06 ns 246.12 ns 247.37 ns]                                    
                        change: [-4.6067% -3.6377% -2.7998%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_datetime_to_rfc2822                                                                             
                        time:   [565.76 ns 568.01 ns 570.94 ns]
                        change: [-2.2735% -1.8146% -1.3809%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_ser_naivedatetime_writer                                                                            
                        time:   [285.23 ns 287.71 ns 290.49 ns]
                        change: [-4.5103% -3.4291% -2.3967%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_ser_naivedatetime_string                                                                            
                        time:   [301.43 ns 302.13 ns 302.88 ns]
                        change: [-6.1510% -5.6113% -5.0359%] (p = 0.00 < 0.05)
                        Performance has improved.

The other benches were within noise threshold. So, bench_datetime_parse_from_rfc2822 & bench_datetime_parse_from_rfc3339 are significantly faster with nonmax, and bench_ser_naivedatetime_writer & bench_ser_naivedatetime_string are faster with U31. I ran both benchmarks twice, so there's a chance that this results are actually meaningful.

This is a strong indication to throw this PR away and go with nonmax instead. Otherwise, there seem to be no benchmarks for arithmetic operations or am I overlooking them?

Kijewski · 2022-09-07T12:34:06Z

How have you tested this?

Whether the optimization works at all you mean? fb79395

Do you have an actual use case where this optimization has a substantial impact?

Any API that returns a fallible DateTime, or stores an Option<DateTime>. In my case I have the latter a lot.

esheppa · 2022-09-07T12:39:10Z

Interestingly if this works nicely with nonmax, potentially it could be used to replace the i32 of the DateImpl as well, allowing Option<NaiveDate> to be the same size as NaiveDate

Fix tests to run in rustc 1.38

f5b73b9

djc reviewed Sep 7, 2022

View reviewed changes

Kijewski added 2 commits September 7, 2022 13:04

Add test

fb79395

Kijewski closed this by deleting the head repository Sep 26, 2022

esheppa mentioned this pull request Nov 20, 2022

Changes to naive::internals to enable more const functions #882

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add niche optimization for `NaiveTime` #811

Add niche optimization for `NaiveTime` #811

Kijewski commented Sep 6, 2022 •

edited

Loading

djc left a comment

djc Sep 7, 2022

Kijewski Sep 7, 2022

x-hgg-x Sep 8, 2022

Kijewski Sep 8, 2022

djc Sep 7, 2022

Kijewski Sep 7, 2022

djc Sep 7, 2022

Kijewski Sep 7, 2022 •

edited

Loading

djc commented Sep 7, 2022

Kijewski commented Sep 7, 2022 •

edited

Loading

Kijewski commented Sep 7, 2022

djc commented Sep 7, 2022

Kijewski commented Sep 7, 2022 •

edited

Loading

esheppa commented Sep 7, 2022

Kijewski commented Sep 7, 2022

esheppa commented Sep 7, 2022

djc commented Sep 7, 2022

Kijewski commented Sep 7, 2022

Kijewski commented Sep 7, 2022

esheppa commented Sep 7, 2022

Add niche optimization for NaiveTime #811

Add niche optimization for NaiveTime #811

Conversation

Kijewski commented Sep 6, 2022 • edited Loading

djc left a comment

Choose a reason for hiding this comment

djc Sep 7, 2022

Choose a reason for hiding this comment

Kijewski Sep 7, 2022

Choose a reason for hiding this comment

x-hgg-x Sep 8, 2022

Choose a reason for hiding this comment

Kijewski Sep 8, 2022

Choose a reason for hiding this comment

djc Sep 7, 2022

Choose a reason for hiding this comment

Kijewski Sep 7, 2022

Choose a reason for hiding this comment

djc Sep 7, 2022

Choose a reason for hiding this comment

Kijewski Sep 7, 2022 • edited Loading

Choose a reason for hiding this comment

djc commented Sep 7, 2022

Kijewski commented Sep 7, 2022 • edited Loading

Kijewski commented Sep 7, 2022

djc commented Sep 7, 2022

Kijewski commented Sep 7, 2022 • edited Loading

esheppa commented Sep 7, 2022

Kijewski commented Sep 7, 2022

esheppa commented Sep 7, 2022

djc commented Sep 7, 2022

Kijewski commented Sep 7, 2022

Kijewski commented Sep 7, 2022

esheppa commented Sep 7, 2022

Add niche optimization for `NaiveTime` #811

Add niche optimization for `NaiveTime` #811

Kijewski commented Sep 6, 2022 •

edited

Loading

Kijewski Sep 7, 2022 •

edited

Loading

Kijewski commented Sep 7, 2022 •

edited

Loading

Kijewski commented Sep 7, 2022 •

edited

Loading