-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial README content #3
Comments
Thanks for this. I wrote something similar independently, and (finally) integrated it. |
simoncozens
added a commit
that referenced
this issue
Mar 24, 2023
* Update checker.py Added mark2base test that uses the serialized buffer to see if a mark has a GPOS shift if placed after a target base mark. * Use shaper to check whether glyphs exist, see #7 * Add youseedee to requirements * Fix some lints * Read your own config file, pylint * More pylint fixes * Pin protobuf dependency * Further poetry dependency fixes * Cache shaping * Fix error message * Implement an "unknown" state * Implement the "report" option * Speed up the mark checker * Don't GSUB closure on pathological fonts * Make pylint happier * Make result status machine readable * A new test for unencoded glyph variants. Fixes #8 * Use the language tag from the language we're checking * Skip tests based on certain conditions (missing features), fixes #11 * Make linter happier * Update orthographies check to include auxiliary chars There is probably a more elegant way to implement this but I have merged auxiliary characters into the bases for the orthographies check. For the purposes of language support testing base and auxiliary characters need to be included to ensure loan words, names and place names can all be typed for a given language. * Improve error messages * Add Neil's work * Pylint stuff * Update shaping_differs.py Fixed Type Error caused by trying to concat YAML to str * Make non-verbose less verbose * Transfer IP to Google --------- Co-authored-by: Simon Cozens <[email protected]> Co-authored-by: Dave Crossland <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
When you choose a font to typeset some text, the very first question that interests you is: which fonts support the language(s) of my text? A font that doesn’t support the languages won’t be of any interest.
But what does it mean, exactly, that a font supports a given language? For Latin-script fonts, the task is reasonably easy and mostly equals to: does the font have glyphs for all the Unicode codepoints used by the language? In reality, this isn’t always so trivial either. To typeset text that is written in English, it’s not enough that the font has glyphs for the A-Z and a-z letters. It also needs digits, and some punctuation. Well, it also probably needs some accented letters, because you may want to write the names Chloë or Brontë, for example.
But it’s still a relatively easy task to check. The Unicode CLDR project collects “exemplar characters” several categories. If you check if the font contains glyphs for all these characters, you can say, “OK, this font supported this language”. The Rosetta Type Hyperglot project contains similar information, with some annotations.
Rationale behind Shaperglot
But this approach does not work for scripts that need “shaping”, a process that maps the input Unicode codepoints of the text into a series of glyphs in a way which is not a 1:1 correspondence. For scripts like Arabic or Devenagari, it’s not enough to check if the font has default glyphs for all Unicode codepoints from some set. You also need to check if the font has some rules (features) that perform the shaping so that the final rendered text is orthographically correct.
Shaperglot allows to check for the Unicode coverage, but also allows other tests. In particular, the idea is that:
The fact that a change happened indicates that there is some support for a language beyond just the Unicode codepoint coverage.
For example, if I put the default
i
and apply thelocl
feature with the script taglatn
and the language tagTRK
, and I see that the output glyph (or series) is different than the input, I can say with higher certainty “this font supports Turkish”.Shaperglot will not (yet ;) ) use computer vision to judge the quality of the change, but it’s based on a very reasonable assumption that if I put in some letter and ask HarfBuzz to apply a certain feature, and the result as the same as the input, then it means that the feature is not meaningfully implemented, hence there is a problem.
The advantage of using Shaperglot approach is that the tests can be complex. Sometimes, the meaningful change will come about only in a combination of certain features, not just one feature. Or maybe an alternative (some fonts may implement something via
liga
, some others may implement the same viaccmp
orcalt
). So the test may ask for all 3 features to be applied and check if something changed.Shaperglot has example implementations of tests for some languages, but needs more data.
In future, additional, more sophisticated tests, can be implemented. Test-driven development can help to have better fonts, but also can help to get better info about language support.
The text was updated successfully, but these errors were encountered: