Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blank control characters to all script files that need it #1

Open
davelab6 opened this issue Mar 18, 2021 · 3 comments
Open

Add blank control characters to all script files that need it #1

davelab6 opened this issue Mar 18, 2021 · 3 comments
Assignees

Comments

@davelab6
Copy link
Member

@chrissimpkins noted that Noto Hebrew includes characters for

  • zero width nonjoiner (U+200C)
  • zero width joiner (U+200D)
  • LTR mark (U+200E)
  • RTL mark (U+200F)

Raph said "BiDi control characters (that includes 200E and 200F, along with 202A-202E and 2066-2069, are handled entirely in the text shaping and layout engine, and do not need cmap entries in the font."

@simoncozens said "you don't normally have explicit glyphs in the font for control characters, since they are handled higher up the text-processing stack and won't appear in runs to be shaped."

I propose in the next build of Noto Hebrew, we remove them.

@davelab6
Copy link
Member Author

Behdad weighed in,

So, we don't show those characters. But in some systems they still affect font selection. Ie. might break a shape run if they are not in cmap. I'd check at least Android code and Firefox (ask Jonathan?). Chrome has no problem. Having them with an empty shape is a fine compromise IMO.

Raph confirmed,

Android is good here: https://android.googlesource.com/platform/frameworks/minikin/+/refs/heads/master/libs/minikin/FontCollection.cpp#302

That might serve as a useful reference for the future - all of the code points in that list are safe to leave out of a cmap with respect to breaking itemization on Android. I definitely agree with Behdad that including them as empty glyphs (zero advance) is the safest thing if we are worried about third party text layout.

So, I'll rename this issue to make sure we roll that out correctly across all the Noto fonts.

@marekjez86 noted,

LTR mark (U+200E), and RTL mark (U+200F):

In Noto ALL CJK fonts, LGC (LatinGreekCyrillic) fonts, all Hebrew fonts and all Arabic fonts support it.

  • Arimo
  • Cousine
  • NotoKufiArabic
  • NotoNaskhArabic
  • NotoNaskhArabicUI
  • NotoNastaliqUrdu
  • NotoRashiHebrew
  • NotoSans
  • NotoSans-Italic
  • NotoSansArabic
  • NotoSansArabicUI
  • NotoSansDisplay
  • NotoSansDisplay-Italic
  • NotoSansHebrew
  • NotoSansMono
  • NotoSerif
  • NotoSerif-Italic
  • NotoSerifDisplay
  • NotoSerifDisplay-Italic
  • NotoSerifHebrew
  • Tinos
  • NotoSansCuneiform
  • NotoSansNKo
  • NotoSansPhagsPa
  • NotoSansSyriac
  • NotoSansThaana

zero width nonjoiner (U+200C), zero width joiner (U+200D):

I believe it is a requirement for ALL Noto fonts to support these (I believe noto_lint.py checks for it), but only 112 out of 200 (or so) fonts support it (all CJK, all LGC, all Hebrew, all Arabic, all from south- southeast- Asia,... support it).

So, we need FB GF profile checks for this that are script aware so all scripts for GF do this correctly; and then this issue can track passing those checks across the Noto collection.

@davelab6 davelab6 changed the title Remove Noto Hebrew control characters Add blank control characters to all script files that need it Mar 18, 2021
@chrissimpkins
Copy link
Member

Full set of code points from the Android source that Raph linked:

0x00AD                            // SOFT HYPHEN
0x034F                            // COMBINING GRAPHEME JOINER
0x061C                            // ARABIC LETTER MARK
(0x200C <= c && c <= 0x200F)      // ZERO WIDTH NON-JOINER..RIGHT-TO-LEFT MARK
(0x202A <= c && c <= 0x202E)      // LEFT-TO-RIGHT EMBEDDING..RIGHT-TO-LEFT OVERRIDE
(0x2066 <= c && c <= 0x2069)      // LEFT-TO-RIGHT ISOLATE..POP DIRECTIONAL ISOLATE
0xFEFF                            // BYTE ORDER MARK

@marekjez86
Copy link

ALL the characters/glyphs present in Noto were specified as REQUIRED for delivery before we would approve them. I will NOT delete anything unless I understand that this requirement is not a requirement any longer. Especially, I don't want to touch Indics (the rule here is "if you break it for any languages in India constitution, you will need to train a Google employee to deal with BIS to allow sales of Android phones [if there's a BIS issue :-)]")

@simoncozens simoncozens transferred this issue from notofonts/noto-fonts Jun 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants