Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add niche optimization for NaiveTime #811

Closed
wants to merge 3 commits into from
Closed

Add niche optimization for NaiveTime #811

wants to merge 3 commits into from

Conversation

Kijewski
Copy link
Contributor

@Kijewski Kijewski commented Sep 6, 2022

NaiveTime contains two integer fields, both with a restricted range:

  • secs, the second of the day, is less than 86,400.
  • frac is used for two things. To store the nanosecond of the second,
    and a leap second, so its highest value is 1,999,999,999.

So, in order to allow niche optimization, one or both values could be
implemented in a type that does not allow higher values. Because frac
is stored later in the struct, it has the better option to yield
optimized code, e.g. when used with Result<NaiveTime, E>. The compiler
has a longer consecutive sequence of bytes to store non-NaiveTime its
data.

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this, interesting approach!

So, in order to allow niche optimization, one or both values could be implemented in a type that does not allow higher values. Because frac is stored later in the struct, it has the better option to yield optimized code, e.g. when used with Result<NaiveTime, E>. The compiler has a longer consecutive sequence of bytes to store non-NaiveTime its data.

If we're going to do this, why not take the 15-bit niche from secs instead of 1 bit from frac? How have you tested this? Do you have an actual use case where this optimization has a substantial impact?

#[cfg(target_endian = "little")]
#[derive(Copy, Clone)]
struct Buf {
align: [u32; 0],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this trick documented anywhere? I'd like this to be commented with a (more or less) authoritative citation on how this behaves.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slices have the same layout as the section of the array they slice. -- Type layout # Slice layout

This is the same for 0 sized slices.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For information, you can also use #[repr(align(4))] to do the same thing (from RFC 1358).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some platforms the alignment of u32 might be 16bits. msp430-none-elf is a 16bit, tier 3 platform.


/// A 31-bit unsigned integer
#[allow(unreachable_pub)] // public through `rkyv::Archive<NaiveTime>`
#[repr(transparent)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is necessary in order for the compiler to leverage the niche? What does it mean for semver compatibility going forward?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a public type, so the user should not need to care how U31 is implemented. Unless a user does very odd stuff like accessing an Option<NaiveTime> in C, they should not be bothered by this change.

Actually, I think I don't need struct U31(Buf) anyhow. I could just let U31 be the endianness-guarded struct directly.

@@ -551,6 +556,7 @@ impl NaiveTime {
} else {
frac = (i64::from(frac) + rhs.num_nanoseconds().unwrap()) as u32;
debug_assert!(frac < 2_000_000_000);
let frac = unsafe { U31::new_unchecked(frac) };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like some comments here that clearly document how/where the invariant is upheld in release builds.

Copy link
Contributor Author

@Kijewski Kijewski Sep 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed chrono is well enough tested, so I can assume the invariant to hold.

If I read the code correctly it is tested that frac < 1_000_000_000 and rhs.num_nanoseconds() cannot return a value >= 1e9. Is the second assumption true? Then I this as a // SAFETY comment.

@djc
Copy link
Member

djc commented Sep 7, 2022

Also, why not go with the ux crate or the arbitrary-int crate?

(IMO the awesome version of this is a crate that we can use at test time to generate code into our src directory specifically for the types that we need, without adding a compile-time dependency for chrono. This would allow the upstream maintainers to improve the code over time without us having to take responsibility for a particular implementation.)

@Kijewski
Copy link
Contributor Author

Kijewski commented Sep 7, 2022

Also, why not go with the ux crate or the arbitrary-int crate?

Neither crate uses niche optimization. I have an open PR for ux to implement it there: rust-ux/uX#49.

My system is far from crappy, but it takes me 17 seconds to compile ux in release mode. I don't think this additional compile time would appropriate to add to chrono.

@Kijewski
Copy link
Contributor Author

Kijewski commented Sep 7, 2022

If we're going to do this, why not take the 15-bit niche from secs instead of 1 bit from frac?

In my current implementation, if the last bit of the struct is set, then the whole thing it not a NaiveTime. So sec can be every value anyway.

NaiveTime is used as the last member in NaiveDateTime, and DateTime, so it better to have the one bit at the very end of the structs, then somewhere in the middle. Otherwise you make the usable byte sequence a little bit smaller.

`NaiveTime` contains two integer fields, both with a restricted range:

- `secs`, the second of the day, is less than 86,400.
- `frac` is used for two things. To store the nanosecond of the second,
  and a leap second, so its highest valus is 1,999,999,999.

So, in order to allow niche optimization, one or both values could be
implemented in a type that does not allow higher values. Because `frac`
is stored later in the struct, it has the better option to yield
optimized code, e.g. when used with `Result<NaiveTime, E>`. The compiler
has a longer consecutive sequence of bytes to store non-NaiveTime its
data.
@djc
Copy link
Member

djc commented Sep 7, 2022

My system is far from crappy, but it takes me 17 seconds to compile ux in release mode. I don't think this additional compile time would appropriate to add to chrono.

Fair point!

If we're going to do this, why not take the 15-bit niche from secs instead of 1 bit from frac?

In my current implementation, if the last bit of the struct is set, then the whole thing it not a NaiveTime. So sec can be every value anyway.

NaiveTime is used as the last member in NaiveDateTime, and DateTime, so it better to have the one bit at the very end of the structs, then somewhere in the middle. Otherwise you make the usable byte sequence a little bit smaller.

The compiler is allowed to reorder fields at will, right? Why wouldn't it do so here? And if not, we can trivially reorder secs and frac within NaiveTime.

@Kijewski
Copy link
Contributor Author

Kijewski commented Sep 7, 2022

The compiler is allowed to reorder fields at will, right? Why wouldn't it do so here? And if not, we can trivially reorder secs and frac within NaiveTime.

I don't think about that. I don't know if the compiler reorders fields to make use of niches. I only know that it will reorder structs to minimize padding because of alignment issues.

It would be actually kind of trivial to copy the file and have a U17 type, also. But that would probably be overkill. :)

@esheppa
Copy link
Collaborator

esheppa commented Sep 7, 2022

Sorry to jump in kind of late here, but could we use something like: nonmax to hold the nanos?

@Kijewski
Copy link
Contributor Author

Kijewski commented Sep 7, 2022

nonmax generates terrible byte code, because it has to negate the value on every access. Most likely twice: once for reading, once for writing. This is enough to make the optimizer not understand what is going on.

Otherwise, how often will you access the data? It is obviously great that nonmax does not need unsafe operations to access the data. Nonmax could be "good enough".

@esheppa
Copy link
Collaborator

esheppa commented Sep 7, 2022

Is it worth us benchmarking both methods so that we have some empirical evidence to inform the direction we choose?

@djc
Copy link
Member

djc commented Sep 7, 2022

In general, these questions haven't been answered yet:

How have you tested this? Do you have an actual use case where this optimization has a substantial impact?

@Kijewski
Copy link
Contributor Author

Kijewski commented Sep 7, 2022

Is it worth us benchmarking both methods so that we have some empirical evidence to inform the direction we choose?

Old values use nonmax, new values use U31:

bench_datetime_parse_from_rfc2822                                                                            
                        time:   [196.87 ns 197.23 ns 197.60 ns]
                        change: [+5.3629% +5.6315% +5.8895%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench_datetime_parse_from_rfc3339                                                                            
                        time:   [163.58 ns 164.23 ns 164.98 ns]
                        change: [+6.6111% +7.8651% +9.1036%] (p = 0.00 < 0.05)
                        Performance has regressed.

bench_datetime_from_str time:   [245.06 ns 246.12 ns 247.37 ns]                                    
                        change: [-4.6067% -3.6377% -2.7998%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_datetime_to_rfc2822                                                                             
                        time:   [565.76 ns 568.01 ns 570.94 ns]
                        change: [-2.2735% -1.8146% -1.3809%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_ser_naivedatetime_writer                                                                            
                        time:   [285.23 ns 287.71 ns 290.49 ns]
                        change: [-4.5103% -3.4291% -2.3967%] (p = 0.00 < 0.05)
                        Performance has improved.

bench_ser_naivedatetime_string                                                                            
                        time:   [301.43 ns 302.13 ns 302.88 ns]
                        change: [-6.1510% -5.6113% -5.0359%] (p = 0.00 < 0.05)
                        Performance has improved.

The other benches were within noise threshold. So, bench_datetime_parse_from_rfc2822 & bench_datetime_parse_from_rfc3339 are significantly faster with nonmax, and bench_ser_naivedatetime_writer & bench_ser_naivedatetime_string are faster with U31. I ran both benchmarks twice, so there's a chance that this results are actually meaningful.

This is a strong indication to throw this PR away and go with nonmax instead. Otherwise, there seem to be no benchmarks for arithmetic operations or am I overlooking them?

@Kijewski
Copy link
Contributor Author

Kijewski commented Sep 7, 2022

How have you tested this?

Whether the optimization works at all you mean? fb79395

Do you have an actual use case where this optimization has a substantial impact?

Any API that returns a fallible DateTime, or stores an Option<DateTime>. In my case I have the latter a lot.

@esheppa
Copy link
Collaborator

esheppa commented Sep 7, 2022

Interestingly if this works nicely with nonmax, potentially it could be used to replace the i32 of the DateImpl as well, allowing Option<NaiveDate> to be the same size as NaiveDate

@Kijewski Kijewski closed this by deleting the head repository Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants