Table splitting for MarkBase #649

cmyr · 2023-10-05T23:22:41Z

This is an initial port of the table splitting impl for this subtable.

Unlike with PairPos, we don't have any real-world failure cases to test this on yet, but I think it's ready as a checkpoint.

This is based on #647, which should go in first. We also need to resolve clippy & rustdoc complains (#648)

anthrotype · 2023-10-06T09:55:45Z

we don't have any real-world failure cases to test this on yet

in the equivalent fonttools PR fonttools/fonttools#1297 from a few years back, some Noto fonts are mentioned. In any case, fontc probably will have to emit some MarkToBase lookups before we can test this with real fonts

This is a checkpoint; the code has not been tested.

Found & fixed one show-stopper in our base array splitting.

This adds a big smoke test that we are passing, and it does include checks that values are correct, so that's something?

This caught one typo in our code that decides what tables we should attempt to split, but otherwise it seems like this worked first try, which is slightly scary

dfrg

Apologies for letting this one sit so long. There's some realllly annoying inherent complexity here so I both compared to HB and traced the logic according to the spec and... this seems solid, nice work! A couple comments/questions inline but I didn't see anything blocking.

dfrg · 2023-11-02T18:58:46Z

write-fonts/src/graph/splitting/mark2base.rs

+    log::debug!(
+        "nothing to split, size '{}'",
+        accumulated + partial_coverage_size
+    );


Should this be inside the conditional below?

it's supposed to be a companion to the trace! above, which prints each split; this prints the size of the final subtable (it's sort of a fenceposts-vs-fences thing)

dfrg · 2023-11-02T19:06:12Z

write-fonts/src/graph/splitting/mark2base.rs

+    // because offsets may be null, and there is no pattern, we visit each one
+    for base_record in base_array.base_records().iter() {
+        let base_record = base_record.unwrap();
+        for (mark_class, offset) in base_record.base_anchor_offsets().iter().enumerate() {
+            let has_offset = !offset.get().is_null();
+            let in_range = (start..end).contains(&mark_class);


I think this comment should be on the inner for loop.

I'm assuming this means that we iterate all and check the range rather than slicing base_anchor_offsets() on start..end because we need to track next_offset_idx? Could we break out of the loop when mark_class >= end? (not suggesting we do so, just checking my understanding of what's going on)

noted, reading this back I agree that there could be more explanation here. Basically: what is tricky here is that the anchor tables may or may not have subtables, and there is no pattern around how subtables are distributed across the set of anchor tables, so we need to visit each offset and see if it's null or not to figure out whether we are copying over a subtable.

The reason we can't break early is because even if we're out of range, if we saw an offset we need to increment next_offset_idx so that once we're back in range we are copying the correct subtable. 🤕

Ohh, yep, I see it now. The comment seems like it's in the proper place then... I was just missing that the offset list is associated with the base array and not the base record (obvious in hindsigh). Thanks!

dfrg · 2023-11-02T19:10:07Z

write-fonts/src/graph/splitting/mark2base.rs

+    objects
+        .iter()
+        .map(|id| {
+            if !visited.insert(*id) {
+                return 0;
+            }
+            // the size of the anchor table
+            let base_size = graph.objects[id].bytes.len();
+            // the size of any devices or variation indices.
+            let children_size = graph.objects[id]
+                .offsets
+                .iter()
+                .map(|id| {
+                    // the mark2pos subgraph is only ever two layers deep
+                    debug_assert!(graph.objects[&id.object].offsets.is_empty());
+                    visited
+                        .insert(id.object)
+                        .then(|| graph.objects[&id.object].bytes.len())
+                        .unwrap_or(0)
+                })
+                .sum::<usize>();
+            base_size + children_size
+        })
+        .sum()


No issues.. just wanted to say that I <3 this code.

cmyr force-pushed the split-mark-base branch from 5065e1a to 9d31d42 Compare October 6, 2023 16:10

cmyr added 4 commits October 6, 2023 12:11

[write-fonts] Preliminary impl of MarkBase splitting

d9aba86

This is a checkpoint; the code has not been tested.

[write-fonts] Test for splitting of mark & base arrays

2056260

Found & fixed one show-stopper in our base array splitting.

[write-fonts] More & better testing of MarkBase splitting

dc987b9

This adds a big smoke test that we are passing, and it does include checks that values are correct, so that's something?

[write-fonts] Full pack/split test for MarkBase

dfe7909

This caught one typo in our code that decides what tables we should attempt to split, but otherwise it seems like this worked first try, which is slightly scary

cmyr force-pushed the split-mark-base branch from 9d31d42 to dfe7909 Compare October 6, 2023 16:23

cmyr marked this pull request as ready for review October 6, 2023 16:24

dfrg approved these changes Nov 2, 2023

View reviewed changes

cmyr merged commit 0a66324 into main Nov 3, 2023

cmyr deleted the split-mark-base branch November 3, 2023 14:33

cmyr mentioned this pull request Dec 1, 2023

Support splitting of MarkToBase subtables #602

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table splitting for MarkBase #649

Table splitting for MarkBase #649

cmyr commented Oct 5, 2023

anthrotype commented Oct 6, 2023

dfrg left a comment

dfrg Nov 2, 2023

cmyr Nov 2, 2023

dfrg Nov 2, 2023

cmyr Nov 2, 2023

dfrg Nov 2, 2023

dfrg Nov 2, 2023

Table splitting for MarkBase #649

Table splitting for MarkBase #649

Conversation

cmyr commented Oct 5, 2023

anthrotype commented Oct 6, 2023

dfrg left a comment

Choose a reason for hiding this comment

dfrg Nov 2, 2023

Choose a reason for hiding this comment

cmyr Nov 2, 2023

Choose a reason for hiding this comment

dfrg Nov 2, 2023

Choose a reason for hiding this comment

cmyr Nov 2, 2023

Choose a reason for hiding this comment

dfrg Nov 2, 2023

Choose a reason for hiding this comment

dfrg Nov 2, 2023

Choose a reason for hiding this comment