Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Filip Ginter committed Dec 7, 2024
1 parent bee68b8 commit 322862d
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 210 deletions.
208 changes: 0 additions & 208 deletions ab/template-index.html

This file was deleted.

4 changes: 2 additions & 2 deletions ka/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@
root + 'lib/ext/jquery.address.min.js'
);
</script>
<h1 id="ud-for-language-">UD for LANGUAGE <span class="flagspan"><img class="flag" src="../../flags/svg/AQ.svg" /></span></h1>
<h1 id="ud-for-georgian-">UD for Georgian <span class="flagspan"><img class="flag" src="../../flags/svg/GE.svg" /></span></h1>

<p>This is a <strong>work-in-progress</strong> overview of the UD annotation for Georgian.</p>

Expand All @@ -85,7 +85,7 @@ <h2 id="tokenization-and-word-segmentation">Tokenization and Word Segmentation</
<ul>
<li>In Modern Georgian, words are delimited regularly by white spaces and punctuation marks. However, in Old Georgian, tokenization was an irregular process, as words were sometimes separated by white spaces and sometimes not. Additionally, depending on the century, words in Old Georgian could also be separated by paragraph separators (჻).</li>
<li>Punctuation symbols are not separated from the words; that holds even for hyphenated compounds such as siblings “და-ძმა” ‘sister and brother’ (one token) etc. However, the dash is separated from the surrounding characters. They can consist of a sequence of symbols, such as a question mark followed by an exclamation mark (?!), an exclamation mark followed by two full stops (!..) and ellipsis (…) and appear: a) in abbreviations (ა.შ. ‘etc.’, ე.ი. ‘i.e.’, etc.) and b) in numeric expressions (1.2, 0,5, etc.).</li>
<li>Due to rich agglutinating type of morphology, clitics can be treated as multi-word tokens and segmented to individual syntactic words in the following cases:
<li>Due to rich agglutinating type of morphology, clitics can be treated as multi-word tokens and segmented to individual syntactic words in the following cases:
a) auxiliary verbs (AUX) attached to the nominal paradigm, which add functional and grammatical meaning to the sentence, expressing tense, aspect, mood, etc.: სახლია = სახლი+ა ‘is a house’;
b) postpositions represented by a suffix attached to an inflected nominal (noun, adjective, numeral and pronoun): სახლში = სახლ+ში ‘in the house’;
c) the indirect speech particle represented by a suffix attached to an inflected nominal or verb: სახლიო = სახლი+ო ‘a house as smb. said’, წერსო = წერს+ო ‘he writes as smb. said’.</li>
Expand Down

0 comments on commit 322862d

Please sign in to comment.