-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[glyphsets] Language definition overhaul #109
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like Maltese, Latvian and Icelandic are under 5_000_000 speakers and should not be included. Either their inclusion needs to be hard-coded or they should be excluded or the requirements need to be adjusted.
For Lithuanian, it could be covered without Lithuanian dictionary notation (which includesthe soft dotted I studf) being covered if the threshold was lowered to include those languages.
Not sure why Bavarian is commented out.
@moyogo I removed languages under 5M from Core, which also included Lithuanian (2.3M). |
Since Maltese, Latvian and Icelandic are the "main/primary" languages of countries in Europe I would include them in Core. I don't know if it means the threshold should be moved or if these languages should just be included. We could try to see what happens if we move the threshold to 2M and decide from there. |
@yanone I’m not suggesting having language specific yaml files. Considering having multiple sources for the data is an issue, I’d suggest replacing the .stub.nam files by data in the same glyphset definition yaml files. Something like: stub:
- 0x0024 DOLLAR SIGN
- etc.
language_codes:
- ca_Latn # Catalan
- cs_Latn # Czech
- etc. If the unencoded glyphs are in the glyphset definition file then something like the following could be in there as well: unencoded_glyphs:
- periodcentered.loclCAT
- periodcentered.loclCAT.case
- etc. If we do want the info in language data, it needs to be more flexible than glyph names. |
Bringing in @vv-monsalve here. One of the reasons why we thought of the approach of putting glyph names into Viviana (or Denis), could you please provide an example for such language-specific glyphs other than the European ones I've already singled out? This is for my own understanding. If it's indeed necessary to have unencoded glyph names in language definitions, I would ask to move forward with the proposal. |
@yanone These depend on the glyphset, the design of the glyphs themselves and the scope or target of the font (for Latin):
There are more for other writing system (like Cyrillic be-cy te-cy sha-cy pe-cy ge-cy gje-cy gebar-cy de-cy ka-cy zhe-cy fi-cy softsign-cy hardsign-cy gedescender-cy gestrokehook-cy tshe-cy etc.). |
Currently, we are adding mainly the SSA languages. But if we eventually add e.g. an indigenous language like Piaroa, it would require glyphs like:
|
@yanone @vv-monsalve Did you mean multiple-to-one glyphs or language specific alternate single glyphs by "unencoded glyphs"? The multiple-to-one glyphs are already listed in the language data. For example, a multiple-to-one glyph like a_cedilla is found as "{a̧}" in the exemplar characters of the languages using it. An alternate single glyph is something like bstroke.alt (or with a clearer name bstroke.midoverlaystroke or for Glyphsapp bstroke.EMPPLG0) for the languages that would use a glyph distinct from the default, when bstroke with a strough through the ascender, for the glyphset and scope. |
@yanone How did you get ʉ̃ in the InDesign document in the first place? If it was with the glyph palette and the glyph wasn’t named properly (uni0289_tildecomb) in the font used then InDesign may not be able to know what text string that glyph represents. I’d recommend using an input tool like https://r12a.github.io/uniview/. On macOS, you can use the Edit > Emoji & Symbols from the menu (but The better input method is to use a proper keyboard layout. For Piaroa, there is https://keyman.com/keyboards/pid_piaroa, the user can use
Yes, that’s basically what I’ve been doing for the GF Latin PriAfrican and GF Latin African PRs. I also have a script that lists the graphemes composed of sequences of characters like ʉ̃ or ä̧.
gflanguages uses IETF BCP47 language tags as identifiers. When you say these languages "aren’t included in Unicode", it’s confusing. Languages not supported by Unicode are the ones where no character exists at all for their orthographies. |
It depends, like mentionned above. For Catalan periodcentered, the default periodcentered should be designed and kerned for Catalan in the first place (Catalan names are not just used in Catalan). The same goes for the glyphs listed in #109 (comment) variants are needed when the default glyphs are not appropriate for specific cases. |
Okay, thank you. I gather that the separate definition of these examples in I've been wanting to rewrite this PR here anyway because I want to see the language list included in the actual Python module because the list is needed for the I'm gonna need a few more days for this. |
After we discovered that we actually don't need a lot of language-specific unencoded glyphs (that was a misunderstanding) I've completely closed the PR over at Instead I've moved the language definitions per glyphset into the Python package alongside the final I've removed the draft status of the PR and now officially ask for the final review. After this is done and a new package is published, I will continue to rewrite the |
…glefonts/glyphsets into language-definition-overhaul
This PR follows through with the overhaul of the character sets in gfglyphsets as outlined here.