Skip to content

Commit

Permalink
Cyrillic Documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
StefanPeev committed Jan 8, 2024
1 parent 0c5bd96 commit 0e4061e
Showing 1 changed file with 20 additions and 28 deletions.
48 changes: 20 additions & 28 deletions documentation/Cyrillic/Cyrillic.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,39 +5,23 @@
As of Unicode version 15.1, Cyrillic script is encoded across several blocks:

| Cyrillic Unicode Blocks | Comments |
|-----|-----|
|+ **Cyrillic: [U+0400–U+04FF](https://www.unicode.org/charts/PDF/U0400.pdf),** 256 characters
+ **Cyrillic Supplement: [U+0500–U+052F](https://www.unicode.org/charts/PDF/U0500.pdf),** 48 characters
+ **Cyrillic Extended-A: [U+2DE0–U+2DFF](https://www.unicode.org/charts/PDF/U2DE0.pdf),** 32 characters
+ **Cyrillic Extended-B: [U+A640–U+A69F](https://www.unicode.org/charts/PDF/UA640.pdf),** 96 characters
+ **Cyrillic Extended-C: [U+1C80–U+1C8F](https://www.unicode.org/charts/PDF/U1C80.pdf),** 9 characters
+ **Cyrillic Extended-D: [U+1E030–U+1E08F](https://www.unicode.org/charts/PDF/U1E030.pdf),** 63 characters
+ **Phonetic Extensions: [U+1D2B, U+1D78](https://www.unicode.org/charts/PDF/U1D00.pdf),** 2 Cyrillic characters
+ **Combining Half Marks: [U+FE2E–U+FE2F](https://www.unicode.org/charts/PDF/UFE20.pdf),** 2 Cyrillic characters | The characters in the range U+0400–U+045F are basically the characters from ISO 8859-5 moved upward by 864 positions. The next characters in the Cyrillic block, range U+0460–U+0489, are historical letters, some of which are still used for Church Slavonic. The characters in the range U+048A–U+04FF and the complete Cyrillic Supplement block (U+0500-U+052F) are additional letters for various languages that are written with Cyrillic script. Two characters are in the Phonetic Extensions block: U+1D2B **** CYRILLIC LETTER SMALL CAPITAL EL from the Uralic Phonetic Alphabet and U+1D78 **** MODIFIER LETTER CYRILLIC EN for transcribing nasal vowels. |



Unicode includes few precomposed accented Cyrillic letters; the others can be combined by adding U+0301 ("combining acute accent") after the accented vowel (e.g., е́ у́ э́); see below.

The following two diacritical marks not specific to Cyrillic can be used with Cyrillic text:

U+0301 ◌́ COMBINING ACUTE ACCENT (= Cyrillic stress mark), in Combining Diacritical Marks block U+0300–U+036F. To input an accented letter with acute accent: for the letter R (for example), digit R0301 (without space between letter and number), then select 0301 only and press Alt + X = Ŕ.
U+20DD ◌⃝ COMBINING ENCLOSING CIRCLE (= Cyrillic ten thousands sign), in Combining Diacritical Marks for Symbols block U+20D0–U+20F0
In the table below, small letters are ordered according to their Unicode numbers; capital letters are placed immediately before the corresponding small letters. Standard Unicode names and canonical decompositions are included.

#### Sources:
Wikipedia. [Cyrillic script in Unicode](https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
Wikipedia. [List of Cyrillic letters](https://en.wikipedia.org/wiki/List_of_Cyrillic_letters)
Wikipedia. [Cyrillic script](https://en.wikipedia.org/wiki/Cyrillic_script)
Wikipedia. [Cyrillic alphabets](https://en.wikipedia.org/wiki/Cyrillic_alphabets)
Wikipedia. [Early Cyrillic alphabet](https://en.wikipedia.org/wiki/Early_Cyrillic_alphabet)
|:-----|:-----|
| **Cyrillic: [U+0400–U+04FF](https://www.unicode.org/charts/PDF/U0400.pdf),** 256 characters | The characters in the range U+0400–U+045F are basically the characters from ISO 8859-5 moved upward by 864 positions. The next characters in the Cyrillic block, range U+0460–U+0489, are historical letters, some of which are still used for Church Slavonic. The characters in the range U+048A–U+04FF and the complete Cyrillic Supplement block (U+0500-U+052F) are additional letters for various languages that are written with Cyrillic script. Two characters are in the Phonetic Extensions block: U+1D2B **** CYRILLIC LETTER SMALL CAPITAL EL from the Uralic Phonetic Alphabet and U+1D78 **** MODIFIER LETTER CYRILLIC EN for transcribing nasal vowels. |
| **Cyrillic Supplement: [U+0500–U+052F](https://www.unicode.org/charts/PDF/U0500.pdf),** 48 characters | Unicode includes few precomposed accented Cyrillic letters; the others can be combined by adding U+0301 ("combining acute accent") after the accented vowel (e.g., е́ у́ э́); see below. |
| **Cyrillic Extended-A: [U+2DE0–U+2DFF](https://www.unicode.org/charts/PDF/U2DE0.pdf),** 32 characters | The following two diacritical marks not specific to Cyrillic can be used with Cyrillic text: |
| **Cyrillic Extended-B: [U+A640–U+A69F](https://www.unicode.org/charts/PDF/UA640.pdf),** 96 characters | U+0301 ◌́ COMBINING ACUTE ACCENT (= Cyrillic stress mark), in Combining Diacritical Marks block U+0300–U+036F. To input an accented letter with acute accent: for the letter R (for example), digit R0301 (without space between letter and number), then select 0301 only and press Alt + X = Ŕ. |
| **Cyrillic Extended-C: [U+1C80–U+1C8F](https://www.unicode.org/charts/PDF/U1C80.pdf),** 9 characters | U+20DD ◌⃝ COMBINING ENCLOSING CIRCLE (= Cyrillic ten thousands sign), in Combining Diacritical Marks for Symbols block U+20D0–U+20F0 |
| **Cyrillic Extended-D: [U+1E030–U+1E08F](https://www.unicode.org/charts/PDF/U1E030.pdf),** 63 characters | In the table below, small letters are ordered according to their Unicode numbers; capital letters are placed immediately before the corresponding small letters. Standard Unicode names and canonical decompositions are included. |
| **Phonetic Extensions: [U+1D2B, U+1D78](https://www.unicode.org/charts/PDF/U1D00.pdf),** 2 Cyrillic characters | |
| **Combining Half Marks: [U+FE2E–U+FE2F](https://www.unicode.org/charts/PDF/UFE20.pdf),** 2 Cyrillic characters | |


## Table of content
+ <a id=tc_cyrch></a>[Cyrillic characters](#cyrch)
+ <a id=tc_cyrext></a>[Cyrillic extensions](#cyrext)
+ <a id=tc_extcyr></a>[Extended Cyrillic](#extcyr)

+ <a id=tc_src></a>[Sources](#src)


## <a id=cyrch></a>[Cyrillic characters](#tc_cyrch)
### Basic Cyrillic alphabet. Unicode range (0410 : 044F)
Expand Down Expand Up @@ -528,4 +512,12 @@ Wikipedia. [Early Cyrillic alphabet](https://en.wikipedia.org/wiki/Early_Cyrilli
| 1E06C | 𞁬 | MODIFIER LETTER CYRILLIC SMALL YERU WITH BACK YER |
| 1E06D | 𞁭 | MODIFIER LETTER CYRILLIC SMALL STRAIGHT U WITH STROKE |
| 1E08F | 𞂏 | COMBINING CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I |



#### <a id=tc_src></a>[Sources](#tc_src)
Wikipedia. [Cyrillic script in Unicode](https://en.wikipedia.org/wiki/Cyrillic_script_in_Unicode)
Wikipedia. [List of Cyrillic letters](https://en.wikipedia.org/wiki/List_of_Cyrillic_letters)
Wikipedia. [Cyrillic script](https://en.wikipedia.org/wiki/Cyrillic_script)
Wikipedia. [Cyrillic alphabets](https://en.wikipedia.org/wiki/Cyrillic_alphabets)
Wikipedia. [Early Cyrillic alphabet](https://en.wikipedia.org/wiki/Early_Cyrillic_alphabet)

0 comments on commit 0e4061e

Please sign in to comment.