-
Notifications
You must be signed in to change notification settings - Fork 553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add niche optimization for NaiveTime
#811
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this, interesting approach!
So, in order to allow niche optimization, one or both values could be implemented in a type that does not allow higher values. Because
frac
is stored later in the struct, it has the better option to yield optimized code, e.g. when used withResult<NaiveTime, E>
. The compiler has a longer consecutive sequence of bytes to store non-NaiveTime its data.
If we're going to do this, why not take the 15-bit niche from secs
instead of 1 bit from frac
? How have you tested this? Do you have an actual use case where this optimization has a substantial impact?
src/naive/time/u31.rs
Outdated
#[cfg(target_endian = "little")] | ||
#[derive(Copy, Clone)] | ||
struct Buf { | ||
align: [u32; 0], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this trick documented anywhere? I'd like this to be commented with a (more or less) authoritative citation on how this behaves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slices have the same layout as the section of the array they slice. -- Type layout # Slice layout
This is the same for 0 sized slices.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For information, you can also use #[repr(align(4))]
to do the same thing (from RFC 1358).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some platforms the alignment of u32 might be 16bits. msp430-none-elf is a 16bit, tier 3 platform.
src/naive/time/u31.rs
Outdated
|
||
/// A 31-bit unsigned integer | ||
#[allow(unreachable_pub)] // public through `rkyv::Archive<NaiveTime>` | ||
#[repr(transparent)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is necessary in order for the compiler to leverage the niche? What does it mean for semver compatibility going forward?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a public type, so the user should not need to care how U31 is implemented. Unless a user does very odd stuff like accessing an Option<NaiveTime>
in C, they should not be bothered by this change.
Actually, I think I don't need struct U31(Buf)
anyhow. I could just let U31 be the endianness-guarded struct directly.
@@ -551,6 +556,7 @@ impl NaiveTime { | |||
} else { | |||
frac = (i64::from(frac) + rhs.num_nanoseconds().unwrap()) as u32; | |||
debug_assert!(frac < 2_000_000_000); | |||
let frac = unsafe { U31::new_unchecked(frac) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like some comments here that clearly document how/where the invariant is upheld in release builds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assumed chrono is well enough tested, so I can assume the invariant to hold.
If I read the code correctly it is tested that frac < 1_000_000_000
and rhs.num_nanoseconds()
cannot return a value >= 1e9. Is the second assumption true? Then I this as a // SAFETY
comment.
Also, why not go with the ux crate or the arbitrary-int crate? (IMO the awesome version of this is a crate that we can use at test time to generate code into our |
Neither crate uses niche optimization. I have an open PR for ux to implement it there: rust-ux/uX#49. My system is far from crappy, but it takes me 17 seconds to compile ux in release mode. I don't think this additional compile time would appropriate to add to chrono. |
In my current implementation, if the last bit of the struct is set, then the whole thing it not a NaiveTime is used as the last member in NaiveDateTime, and DateTime, so it better to have the one bit at the very end of the structs, then somewhere in the middle. Otherwise you make the usable byte sequence a little bit smaller. |
`NaiveTime` contains two integer fields, both with a restricted range: - `secs`, the second of the day, is less than 86,400. - `frac` is used for two things. To store the nanosecond of the second, and a leap second, so its highest valus is 1,999,999,999. So, in order to allow niche optimization, one or both values could be implemented in a type that does not allow higher values. Because `frac` is stored later in the struct, it has the better option to yield optimized code, e.g. when used with `Result<NaiveTime, E>`. The compiler has a longer consecutive sequence of bytes to store non-NaiveTime its data.
Fair point!
The compiler is allowed to reorder fields at will, right? Why wouldn't it do so here? And if not, we can trivially reorder |
I don't think about that. I don't know if the compiler reorders fields to make use of niches. I only know that it will reorder structs to minimize padding because of alignment issues. It would be actually kind of trivial to copy the file and have a U17 type, also. But that would probably be overkill. :) |
Sorry to jump in kind of late here, but could we use something like: nonmax to hold the nanos? |
nonmax generates terrible byte code, because it has to negate the value on every access. Most likely twice: once for reading, once for writing. This is enough to make the optimizer not understand what is going on. Otherwise, how often will you access the data? It is obviously great that nonmax does not need unsafe operations to access the data. Nonmax could be "good enough". |
Is it worth us benchmarking both methods so that we have some empirical evidence to inform the direction we choose? |
In general, these questions haven't been answered yet:
|
Old values use nonmax, new values use U31:
The other benches were within noise threshold. So, This is a strong indication to throw this PR away and go with nonmax instead. Otherwise, there seem to be no benchmarks for arithmetic operations or am I overlooking them? |
Whether the optimization works at all you mean? fb79395
Any API that returns a fallible |
Interestingly if this works nicely with |
NaiveTime
contains two integer fields, both with a restricted range:secs
, the second of the day, is less than 86,400.frac
is used for two things. To store the nanosecond of the second,and a leap second, so its highest value is 1,999,999,999.
So, in order to allow niche optimization, one or both values could be
implemented in a type that does not allow higher values. Because
frac
is stored later in the struct, it has the better option to yield
optimized code, e.g. when used with
Result<NaiveTime, E>
. The compilerhas a longer consecutive sequence of bytes to store non-NaiveTime its
data.