You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We currently detect script subsets such as Arabic by counting the number of glyphs in the cmap table and then seeing if greater than 50% of them are in a specific script subset .nam file. Instead, what if we simply checked if a font fulfills a gflanguage base charset e.g for Arabic, we'd need all the base characters in https://github.com/googlefonts/lang/blob/main/Lib/gflanguages/data/languages/ar_Arab.textproto#L44?
The text was updated successfully, but these errors were encountered:
That sounds a lot better, although we need to think it through:
A script like Arabic is used to write multiple languages - we don't have a concept of which is the "primary" language for the script at the moment, so do we require a superset of all of them?
Perhaps not, because a font which supports Arabic shouldn't be skipped just because it doesn't contain Uyghur letter ۇ, and a font can support Latin even if it doesn't have Ᵽ.
So are we back to looking at support for a certain percentage of covered characters within all languages of a script?
Or can we require the intersection of all of the bases for all languages for a script? Nice idea, but the intersection for Latin is the null set - there are no characters which are an exemplar for every Latin language!
I can see two approaches:
YOLO. If a font contains any character in a subset, it gets that subset. Anything less than that and you're removing glyphs from a font, and why do we want to do that? Or
Compute the supported languages first, and then declare support for all of their scripts.
Compute the supported languages first, and then declare support for all of their scripts.
I like this a lot. gflanguages also includes population count data. Perhaps instead of counting glyphs, counting how many people you're able to cover may be better.
We currently detect script subsets such as Arabic by counting the number of glyphs in the cmap table and then seeing if greater than 50% of them are in a specific script subset .nam file. Instead, what if we simply checked if a font fulfills a gflanguage base charset e.g for Arabic, we'd need all the base characters in https://github.com/googlefonts/lang/blob/main/Lib/gflanguages/data/languages/ar_Arab.textproto#L44?
The text was updated successfully, but these errors were encountered: