Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a checksum field to DataResponseMetadata, and use it for timezone consistency #6046

Merged
merged 12 commits into from
Feb 3, 2025

Conversation

robertbastian
Copy link
Member

Based on #6043

@robertbastian robertbastian force-pushed the checksum branch 2 times, most recently from 009cf1a to 8de9ae0 Compare January 29, 2025 16:58
@robertbastian robertbastian marked this pull request as ready for review January 29, 2025 19:18
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good; let's get a design LGTM from someone else like @Manishearth since this is the first real use of DataResponseMetadata. Also have a few small questions

@@ -1997,22 +1997,6 @@ impl<FSet: DateTimeNamesMarker> RawDateTimeNames<FSet> {
Ok(())
}

fn load_mz_periods<P>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I prefer the previous style of separate functions for loading separate keys. One important decision here is that we use Default as the DataRequest, and I don't want to inline that assumption to a bunch of places.

Just keep this function and make it return the DataResponseMetadata

Comment on lines 176 to 177
// TODO how does this behave on 32-bit platforms?
sub_map.insert(CHECKSUM_KEY, checksum as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checksum should be a u64 not a usize right?

Copy link
Member Author

@robertbastian robertbastian Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I put a usize value into a ZeroAsciiTrie on a 64-bit system, and then access it on a 32-bit system, what will have happened to it? How many bits are stored on the wire?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses usize in the API because it doesn't really care what integer is stored; it stores a varint. Most use cases of ZeroTrie have involved storing indices into an array of data sitting on the side, so that's mainly what the API was optimized for.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to get 64 bits out regardless off the usize length then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding an API to ZeroTrie to get/set unsigned ints of specific sizes.

@robertbastian robertbastian requested a review from sffc January 29, 2025 19:50
@robertbastian
Copy link
Member Author

this is the first real use of DataResponseMetadata

DeserializingBufferProvider wouldn't work without it.

Comment on lines +1 to +2
time_zone/bcp47_to_iana@1, <singleton>, 7622B, 7600B, 35e1d2b320b41234
time_zone/iana_to_bcp47@3, <singleton>, 9543B, 9500B, 7218ae639c063764
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional): It's not clear whether these sizes include the checksum. I think they should, at least for postcard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't, postcard doesn't track indexing overhead either, so I don't think we need to track checksizes

Comment on lines 176 to 177
// TODO how does this behave on 32-bit platforms?
sub_map.insert(CHECKSUM_KEY, checksum as usize);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding an API to ZeroTrie to get/set unsigned ints of specific sizes.


if let Some(checksum) = metadata.checksum {
write!(
&mut fs::File::create(format!("{}_checksum", path_buf.display()))?,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought/Suggestion: {}.checksum

} else if let Some(checksum) = metadata.checksum {
write!(
&mut fs::File::create(path_buf.join(".checksum"))?,
"{checksum}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: write the checksum as a hexadecimal literal.

}
.ok()
.and_then(|s| s.parse().ok());
if !marker.is_singleton {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question/Issue: Why did you move the is_singleton condition above the marker_attributes condition

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was a bug

@robertbastian robertbastian requested a review from sffc February 3, 2025 15:17
Manishearth
Manishearth previously approved these changes Feb 3, 2025
Copy link
Member

@Manishearth Manishearth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach lgtm. midly worried about this having to be passed through everywhere now, but should be fine. consider adding a data key assoc const / metadata bit the way we have for singletons that signals that a key has a checksum

Manishearth
Manishearth previously approved these changes Feb 3, 2025
Manishearth
Manishearth previously approved these changes Feb 3, 2025
where
P: BoundDataProvider<M> + ?Sized,
Self: Sized,
{
let arg_variables = variables;
match &self.inner {
OptionalNames::SingleLength { variables, .. } if arg_variables == *variables => {
return Ok(Ok(()));
return Ok(Ok(Default::default()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, is this ok? Or should we retain the DataResponseMetadata?

sffc
sffc previously approved these changes Feb 3, 2025
@robertbastian robertbastian merged commit 08b682e into unicode-org:main Feb 3, 2025
28 checks passed
@robertbastian robertbastian deleted the checksum branch February 3, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants