Implement PackedPatternsV1 with packing and unpacking #5580

sffc · 2024-09-23T21:13:50Z

Part of #257

Replaces #5521

This reverts commit b2ea3e6.

This reverts commit 2f666d4.

sffc · 2024-09-24T18:27:15Z

Still needs databake and serde impls, but the packing part is ready for review/landing.

For human-readable Serde, I was thinking of serializing something resembling the builder; maybe I will just derive Serde on that and use it directly.

sffc · 2024-09-24T18:30:13Z

The individual commits are not that useful; most of the changes are in just the one new file.

Manishearth

reviewed builder, still reviewing others

Manishearth · 2024-09-24T19:33:19Z

components/datetime/src/provider/packed_pattern.rs

+///
+/// The LMS value determines which pattern index is used for the first column:
+///
+/// | LMS Value | Long Index | Medium Index | Short Index |


nit: let's add a column that contains the entries "LMS", "L, MS", "LM, S", and "L, M, S" since it's easier to read.

Done in 6b30bd8

Manishearth · 2024-09-24T19:34:39Z

components/datetime/src/provider/packed_pattern.rs

+/// | Sc            | S + 6             | 18-20           | Sa          |
+///
+/// As a result, if there are no variants, bits 2 and higher will be all zero,
+/// making the header int suitable for varint packing.


nit: varint packing, as used by postcard and other optimized binary serialization formats.

Done in 4e94347

Manishearth · 2024-09-24T19:42:48Z

components/datetime/src/provider/packed_pattern.rs

+/// | Lb            | S + 1             | 3-5             | La          |
+/// | Mb            | S + 2             | 6-8             | Ma          |
+/// | Sb            | S + 3             | 9-11            | Sa          |
+/// | Lc            | S + 4             | 12-14           | La          |


question: so inheritence is always from a? I thought c inherits from b first?

we could mark Lc as inheriting from "Lb, then La" etc

I changed both variants to inherit from standard because it was easier to implement, and in practice, if both variant0 and variant1 are present, they will definitely be different from each other. The main inheritance that matters is variant0 from standard.

Okay, that's convincing.

Manishearth · 2024-09-24T19:43:56Z

components/datetime/src/provider/packed_pattern.rs

+/// The variants are currently used by year formatting:
+///
+/// - Standard: Year, which could be partial precision (2-digit Gregorain)
+/// - Variant 0: Full Year, which is always full precision


issue: Not a fan of calling them variant0 variant1 in code here; can we primarily use descriptive names and in the docs mention that this is also "variant0" and "variant1"?

Is your suggestion to call them full_year and with_era? Because I wanted these to be generalizable enough to other cases that might want to use this mechanism in the future. For instance, maybe some locale will want to use it for day periods or something ("6 in the evening", "7 in the evening", "8pm", "9pm").

Manishearth · 2024-09-24T19:45:56Z

components/datetime/src/provider/packed_pattern.rs

+        {
+            if let Some(pattern) = pattern {
+                if pattern != fallback {
+                    *chunk = match elements.iter().position(|p| p == pattern) {


suggestion: potentially use a BTreeMap<pattern, index> to speed this up?

No Ord impls available and I don't particularly want to add them; they would peculate everywhere (20+ inner types including all fields, etc). And we can't move the first 1-3 elements on the vec, anyway. So the linear search seems alright.

Okay for now. Worth checking how slow this codegen is.

Manishearth · 2024-09-24T19:47:20Z

components/datetime/src/provider/packed_pattern.rs

+
+        // Check to see if we need to switch to Q=1 mode
+        #[allow(clippy::unwrap_used)] // the array is nonempty
+        if chunks.iter().max().unwrap() >= &0x8 {


nit: I don't think we need to use hex here. Also minor style nit would be to have the * on the LHS

Stylistically, I prefer writing in hex when I'm emphasizing that I care more about the bit representation than the numeric value. "8" means "the number 8" and "0x8" means "an integer with the fourth bit on and all others off"

ah, I guess so.

Made it a constant in 6b30bd8

Manishearth · 2024-09-24T19:47:40Z

components/datetime/src/provider/packed_pattern.rs

+
+/// A builder for a [`PackedPatternsV1`].
+#[derive(Debug, Clone, PartialEq, Eq)]
+pub struct PackedPatternsBuilder<'a> {


nit: datagen-only?

So, in the next PR, I use this type for human-readable deserialization, not just datagen.

Manishearth · 2024-09-24T19:48:06Z

components/datetime/src/provider/packed_pattern.rs

+        }
+
+        // Check to see if we need to switch to Q=1 mode
+        #[allow(clippy::unwrap_used)] // the array is nonempty


suggestion: if we make this datagen-only, the comment could say "the array is nonempty and this is datagen-only"

Improved the comment in c01686e

I wish we would have array::max one of these days: rust-lang/rust#78504

I could also use one of the workarounds in that issue to avoid the unwrap

components/datetime/src/provider/packed_pattern.rs

It is impossible for an IndexN array to need more than a length integer of size N, anyway, the max index is always `>=` the length. Part of #5523 Builds on #5593 We could in theory just have a `VZVFormatCombo<Index, Len>` type that allows free selection, however I'm trying to keep this minimal. Overall the main use case for that is picking things like "a small array of ;argely-sized elements" and we could just expose Index16Len8 for that. I can see that being useful for things like #5580, though it also feels like a data microoptimization. The "total" lines in fingerprints.csv are interspersed in giant diffs, and this basically only gets a max of 2-6 byte wins per data, but the overall data size went down by 200KB. Not amazing, not terrible. ```rust [18:26:22] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ $ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 23501369 [18:26:08] मanishearth@manishearth-glaptop2 ~/dev/icu4x/provider/data ^_^ $ rg total | awk '{ gsub(/B,/, "", $3); s +=$3} END{print s}' 23391499 ```

Draft up docs and structure for PackedSkeletonDataV2

8fe27b0

sffc requested review from zbraniecki and nordzilla as code owners September 23, 2024 21:13

sffc marked this pull request as draft September 23, 2024 21:13

sffc removed request for zbraniecki and nordzilla September 23, 2024 21:14

Manishearth previously approved these changes Sep 23, 2024

View reviewed changes

sffc added 3 commits September 23, 2024 14:16

typo

1a8b794

Add note about small numbers

7780ad4

fmt

c4ecf50

sffc dismissed Manishearth’s stale review via c4ecf50 September 23, 2024 21:19

sffc and others added 18 commits September 23, 2024 19:37

Add Hash impls (this commit can probably be reverted)

b2ea3e6

Add impls to plurals crate

9c3b12c

impl Hash for ZeroVec

2f666d4

Checkpoint

035f69f

Revert "Add Hash impls (this commit can probably be reverted)"

d507474

This reverts commit b2ea3e6.

Revert "impl Hash for ZeroVec"

cc95d08

This reverts commit 2f666d4.

No hash impls

5f575e7

Compiling builder code

c01e882

Another impl in plurals crate

7a0d922

fmt

28277ec

Initial tests

da2bf9c

Docs warnings

df65f3e

fmt

053351e

diplomat-coverage

bff7844

Make it infallible. Resolves Clippy

20ac388

Cleanup, tests

49ce950

Rename to PatternsPackedV1

cf6d717

PackedPatterns

8526094

sffc marked this pull request as ready for review September 24, 2024 18:29

Add comment about VZV length

5cbe18f

sffc changed the title ~~Draft up docs and structure for PackedSkeletonDataV2~~ Implement PackedPatternsV1 with packing and unpacking Sep 24, 2024

sffc requested a review from Manishearth September 24, 2024 18:39

sffc mentioned this pull request Sep 24, 2024

Impl Serde for PackedPatternsV1; export Serde impls for PluralElements #5592

Closed

Manishearth reviewed Sep 24, 2024

View reviewed changes

Manishearth previously approved these changes Sep 24, 2024

View reviewed changes

components/datetime/src/provider/packed_pattern.rs Outdated Show resolved Hide resolved

components/datetime/src/provider/packed_pattern.rs Outdated Show resolved Hide resolved

sffc added 3 commits September 24, 2024 16:22

Pull some constants into constants module; document LMS better

6b30bd8

Postcard note

4e94347

chunks comments

c01686e

sffc dismissed Manishearth’s stale review via c01686e September 24, 2024 23:34

sffc requested a review from Manishearth September 24, 2024 23:36

clippy

564669f

Manishearth approved these changes Sep 24, 2024

View reviewed changes

sffc merged commit 1141c8e into unicode-org:main Sep 25, 2024
28 checks passed

sffc deleted the packed3 branch September 25, 2024 00:25

Manishearth mentioned this pull request Sep 25, 2024

Use N-bit lengths for VZV IndexN format types #5594

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement PackedPatternsV1 with packing and unpacking #5580

Implement PackedPatternsV1 with packing and unpacking #5580

sffc commented Sep 23, 2024

sffc commented Sep 24, 2024

sffc commented Sep 24, 2024

Manishearth left a comment

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024 •

edited

Loading

Manishearth Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

Manishearth Sep 24, 2024

sffc Sep 24, 2024

sffc Sep 24, 2024

Implement PackedPatternsV1 with packing and unpacking #5580

Implement PackedPatternsV1 with packing and unpacking #5580

Conversation

sffc commented Sep 23, 2024

sffc commented Sep 24, 2024

sffc commented Sep 24, 2024

Manishearth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sffc Sep 24, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sffc Sep 24, 2024 •

edited

Loading