Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soninke needs an eng, it’s right there in the sample text #200

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

simoncozens
Copy link
Contributor

  • Soninke needs an eng, it’s right there in the sample text
  • Yucateco needs a MODIFIER LETTER APOSTROPHE
  • Yapese also needs a modifier letter apostrophe
  • Remove apostrophe from Maithali sample

@moyogo
Copy link
Contributor

moyogo commented Dec 11, 2024

@simoncozens

  • Soninke needs an eng, it’s right there in the sample text

It needs Ŋ as well as all capitals are listed. It also needs ñ (used in Senegal, Gambia, Mauritania) and ɲ (used in Mali).
See #201.

  • Remove apostrophe from Maithali sample

https://www.unicode.org/L2/L2008/08197--bodo-dogri-maithili.pdf

Bodo, Dogri and Maithili languages are also written in Devanagari script. These
languages are using Latin Apostrophe (U+02BC) to denote special tone mark.
This tone maker is known as [...] Bikari Kaamaa (िबकारी कामा) in Maithili.

If that is not true in general practice (https://mai.wikipedia.org or https://www.ohchr.org/sites/default/files/UDHR/Documents/UDHR_Translations/mai.pdf doesn’t use it for example) and is only true in some contexts (https://udhr.audio/UDHR_Video.asp?lng=mai uses it for example), U+02BC should be moved to auxiliary.

@simoncozens
Copy link
Contributor Author

I went with your PR instead of mine for Soninke. I think we need more clarity on what we're doing with ’. On the one hand, it's just a punctuation mark, you don't put . in base. On the other, we have fonts which will render the sample texts with tofu because they contain all the other characters but not ’ - in which case, ’ is required for correctly displaying the language. On the gripping hand, if it's something like a glottal stop, then that's clearly a "letter".

Do we need to invent a required_punctuation category? If so, then we need to audit the existing punctuation category to see which entries are required and which are optional. 🙃

@moyogo
Copy link
Contributor

moyogo commented Dec 11, 2024

Yes, if it can be done, punctuation that is auxiliary should be moved to auxiliary.
Seeing how broken the snk_Latn (SLDR and gflanguages) or even the mg_Latn are (CLDR and SLDR, Hyperglot, gflanguages), more than punctuation should be reviewed.

For ’ U+2019, it’s difficult to be consistent as it is often used as a letter instead of ʼ U+02BC for the glottal stop. I’ve tried to only include it when it is listed as a letter in the alphabet in references.
There’s also / U+002F that is used in some languages as a letter and there’s no equivalent letter character for it, excluding the languages that use it as a fallback for ǀ U+01C0. The punctuation sign · U+00B7 also ends up in the base characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants