Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Table splitting for MarkBase #649

Merged
merged 4 commits into from
Nov 3, 2023
Merged

Table splitting for MarkBase #649

merged 4 commits into from
Nov 3, 2023

Conversation

cmyr
Copy link
Member

@cmyr cmyr commented Oct 5, 2023

This is an initial port of the table splitting impl for this subtable.

Unlike with PairPos, we don't have any real-world failure cases to test this on yet, but I think it's ready as a checkpoint.

This is based on #647, which should go in first. We also need to resolve clippy & rustdoc complains (#648)

@anthrotype
Copy link
Member

we don't have any real-world failure cases to test this on yet

in the equivalent fonttools PR fonttools/fonttools#1297 from a few years back, some Noto fonts are mentioned. In any case, fontc probably will have to emit some MarkToBase lookups before we can test this with real fonts

@cmyr cmyr force-pushed the split-mark-base branch from 5065e1a to 9d31d42 Compare October 6, 2023 16:10
cmyr added 4 commits October 6, 2023 12:11
This is a checkpoint; the code has not been tested.
Found & fixed one show-stopper in our base array splitting.
This adds a big smoke test that we are passing, and it does include
checks that values are correct, so that's something?
This caught one typo in our code that decides what tables we should
attempt to split, but otherwise it seems like this worked first try,
which is slightly scary
@cmyr cmyr force-pushed the split-mark-base branch from 9d31d42 to dfe7909 Compare October 6, 2023 16:23
@cmyr cmyr marked this pull request as ready for review October 6, 2023 16:24
Copy link
Member

@dfrg dfrg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for letting this one sit so long. There's some realllly annoying inherent complexity here so I both compared to HB and traced the logic according to the spec and... this seems solid, nice work! A couple comments/questions inline but I didn't see anything blocking.

Comment on lines +64 to +67
log::debug!(
"nothing to split, size '{}'",
accumulated + partial_coverage_size
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inside the conditional below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's supposed to be a companion to the trace! above, which prints each split; this prints the size of the final subtable (it's sort of a fenceposts-vs-fences thing)

Comment on lines +177 to +182
// because offsets may be null, and there is no pattern, we visit each one
for base_record in base_array.base_records().iter() {
let base_record = base_record.unwrap();
for (mark_class, offset) in base_record.base_anchor_offsets().iter().enumerate() {
let has_offset = !offset.get().is_null();
let in_range = (start..end).contains(&mark_class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment should be on the inner for loop.

I'm assuming this means that we iterate all and check the range rather than slicing base_anchor_offsets() on start..end because we need to track next_offset_idx? Could we break out of the loop when mark_class >= end? (not suggesting we do so, just checking my understanding of what's going on)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted, reading this back I agree that there could be more explanation here. Basically: what is tricky here is that the anchor tables may or may not have subtables, and there is no pattern around how subtables are distributed across the set of anchor tables, so we need to visit each offset and see if it's null or not to figure out whether we are copying over a subtable.

The reason we can't break early is because even if we're out of range, if we saw an offset we need to increment next_offset_idx so that once we're back in range we are copying the correct subtable. 🤕

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, yep, I see it now. The comment seems like it's in the proper place then... I was just missing that the offset list is associated with the base array and not the base record (obvious in hindsigh). Thanks!

Comment on lines +217 to +240
objects
.iter()
.map(|id| {
if !visited.insert(*id) {
return 0;
}
// the size of the anchor table
let base_size = graph.objects[id].bytes.len();
// the size of any devices or variation indices.
let children_size = graph.objects[id]
.offsets
.iter()
.map(|id| {
// the mark2pos subgraph is only ever two layers deep
debug_assert!(graph.objects[&id.object].offsets.is_empty());
visited
.insert(id.object)
.then(|| graph.objects[&id.object].bytes.len())
.unwrap_or(0)
})
.sum::<usize>();
base_size + children_size
})
.sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues.. just wanted to say that I <3 this code.

@cmyr cmyr merged commit 0a66324 into main Nov 3, 2023
@cmyr cmyr deleted the split-mark-base branch November 3, 2023 14:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants