Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Language Request]: Bangla/Bengali #45

Closed
2 tasks done
rago666 opened this issue Jul 4, 2024 · 9 comments
Closed
2 tasks done

[Language Request]: Bangla/Bengali #45

rago666 opened this issue Jul 4, 2024 · 9 comments
Labels
new language Request for new text language

Comments

@rago666
Copy link

rago666 commented Jul 4, 2024

English Name

Bangla/Bengali

Native Name

বাংলা

Orthography

Bengali alphabet is derived from the Brahmi alphabet while also closely relating to the Devanagari alphabet. It is the 7th most spoken language in the world and is the official language of Bangladesh and 2nd most spoken in India.

Basics

Bengali consists of 50 letters. 11 vowels ( অ, আ, ই, ঈ, উ, ঊ, ঋ, এ, ঐ, ও, ঔ ) and 39 Consonants (ক, খ, গ, ঘ, ঙ,
চ, ছ, জ, ঝ, ঞ, ট, ঠ, ড, ঢ, ণ, ত, থ, দ, ধ, ন, প, ফ, ব, ভ, ম, য, র, ল, শ, ষ, স, হ, ড়, ঢ়, য়, ৎ, ং, ঃ, ঁ).
Vowels can be found at the beginning, in the middle or in the end of the world. Example: (লি, আশ, স). Same with consonants. Example: কলম -> ক, ল, ম each a consonant on different position.

Diacritics

When we join a vowel with a consonant, we use the short form of that vowel (Vowel Diacritics). This are called KAR(কার). Bengali has 10 vowel diacritics (া, ি, ী, ু, ূ, ে, ৈ, ো, ৌ, ৃ). They can be added after (সাপ), before (বিষ), below (কুটিল) or before and after consonants ( পৌর ).
There are also 7 consonant diacritics, they are called PHOLA (ফলা) that can join with vowel or consonant. we use hôsôntô (্) for this operation. Example below
য ফলা -> অ + ্ + য -> অ্য -> অ্যাপ্লিকেশন
ব ফলা -> শ + ্ + ব -> শ্ব -> বিশ্বাস
ম-ফলা ->ন + ্ + ম -> ন্ম -> তন্ময়
ণ-ফলা ->হ + ্ + ণ -> হ্ণ -> অপরাহ্ণ
ন-ফলা ->ত + ্ + ন -> ত্ন -> রত্ন
রেফ -> র + ্ + শ-> র্শ -> বর্শ
র-ফলা -> ক + ্ + র -> ক্র -> ক্রম
ল-ফলা -> ল + ্ + ল-> ল্ল -> বল্লম

Consonant Conjuncts

A conjunct is a combination of two consonants. There are a lot of them. Consonant diacritics are also a form of conjuncts but not vowels diacritics are not. We write them the same way we write consonant diacritics. Example:
ক্ক - ক + ্ + ক
ক্ট - ক + ্ + ট
ক্ষ - ক + ্ + ম

Punctuation Marks

Same as English. Once exception is we use DARI ( । ) instead of full stop (.) and space is needed before and after the sentence is finished. Example:
রফিক মাছ ধরতে গিয়েছে ।

Writing

Bengali has no letter case so not capital or small letters. In linux I use the inbuilt Bangla (Probhat) layout for writing. Whatever layout it may be the writing system is almost the same. Here are some basic rules

  • While writing Vowel Diacritics always come after the consonant. Example
    ি + ব + ষ - িবষ ❌
    ব + ি + ষ - বিষ ✅
  • র ফলা (one of the consonant diacritics) can go before or after a consonant but based on it's position the word will change. When it goes before the word it is called Ref (রেফ), when it goes after it is called R-Phola(র-ফলা). Example.
    রেফ -> র + ্ + শ-> র্শ -> বর্শ
    র-ফলা -> ক + ্ + র -> ক্র -> ক্রম

probhat

Writing a some Bangla using Probhat (QWERTY)

বাংলা আমার মাতৃভাষা । বৃহন্নলার পাঁচ ভাই ক্ষমতার লোভে মত্ত ।
baLla vmar maf<BaSa . b<hn/nlar pa>c BaI k/Smfar l]B[ mf/f .

Implementation Assistance

  • I am proficient enough in this language to spot mistakes and unnatural words
  • I can assist with testing and reviewing the language implementation

Additional Information

No response

@rago666 rago666 added the new language Request for new text language label Jul 4, 2024
@bragefuglseth
Copy link
Owner

bragefuglseth commented Jul 4, 2024

Hi, thanks a lot for the language request! That writing sample (বাংলা আমার মাতৃভাষা ।) was really helpful. I'm currently spinning up an initial implementation of Bangla text generation, but I'm a little confused about the space before the dari sign. When implementing text generation for Hindi (#6) and Nepali (#5), I never encountered this convention, and upon doing some further research, I've discovered that Microsoft's Bangla (India) Localization Style Guide doesn't recommend it either:

A punctuation mark (৷) indicating a full stop, placed at the end of declarative sentences
and other statements thought to be complete. There is no space between the last letter
and the period.
Use one space between the period and the first letter of the next
sentence.

If you think it makes sense for the extra space to be there for Bangla specifically, and not Hindi and Nepali, I'll gladly go ahead and set that up. However, for the sake of consistency across all Devanagari languages in Keypunch, I'm currently inclined to use the convention of no punctuation between the dari and its preceding word for all three of them 🙂

bragefuglseth added a commit that referenced this issue Jul 4, 2024
@rago666
Copy link
Author

rago666 commented Jul 5, 2024

Thank you very much. You can ignore the extra spacing before (।).
I have built the app from repo using gnome builder and testing it for a few minutes. A few problems I found.

  1. য় (z) and ড় (R) does not work in simple and advance mode.
  2. কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅
  3. My mistake for not mentioning it before. Bangla has it's numbers
    (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) -> (০, ১, ২, ৩, ৪, ৫, ৬, ৭, ৮, ৯)। I apologize for not mentioning it in the original issue.

@bragefuglseth
Copy link
Owner

য় (z) and ড় (R) does not work in simple and advance mode.

I've discovered that Monkeytype has the exact same issue, and it's related to character representation. In the word list, those letters are stored as two characters; a base shape and a modifier character for the dot. In modern Bangla text encoding, though, those letters can also be represented as a single character that has the dot included, and that's what people usually enter on keyboards. These two representation methods are completely different letters from the perspective of the computer.

Monkeytype apparently has to represent them the former way due to technical constraints, but since Keypunch uses GTK's native text machinery instead of rolling its own, I don't think we have the same issue. So a quick fix I'll try for now is to just replace the "outdated" letters with their modern counterparts.

কিন্ত (kin/f) -> ন + ্ + ত -> ন্ত ❌ ; কিন্তু (kin/fu) -> ন + ্ + ত + ু -> ন্তু ✅

Both of those spellings exist in the word list. I assume that the first one should be removed? It would be good to open an issue against Monkeytype as well, then. That's where the original list is from.

bragefuglseth added a commit that referenced this issue Jul 5, 2024
@bragefuglseth
Copy link
Owner

I haven't looked at the numbers yet, but the other mistakes should be fixed.

@rago666
Copy link
Author

rago666 commented Jul 5, 2024

This 4 words have problem, নিয়ে হয়ে দিয়ে হয়েছে. The right spelling are given below

নিয়ে ( niz[ );‌
হয়ে ( hz[ );
দিয়ে ( qiz[ );
হয়েছে ( hz[C[ )

bragefuglseth added a commit that referenced this issue Jul 8, 2024
bragefuglseth added a commit that referenced this issue Jul 8, 2024
@bragefuglseth
Copy link
Owner

Could you give it a go again now? 🙂

@bragefuglseth
Copy link
Owner

By the way, if you'd like to , you can provide a name (and optionally a website link or an email address), and I'll credit you in the Orthography section of the about window.

@rago666
Copy link
Author

rago666 commented Jul 8, 2024

Everything works perfectly now! You can close this now.

I'll credit you in the Orthography section of the about window.

I would be honored if you may include my name Arnob Goswami. Thank you for your consideration.

@bragefuglseth
Copy link
Owner

I'm very glad to hear that! Thank you so much for your help.

bragefuglseth added a commit that referenced this issue Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new language Request for new text language
Projects
None yet
Development

No branches or pull requests

2 participants