Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
Filip Ginter committed Oct 9, 2023
1 parent 1a340dc commit 08819c4
Showing 1 changed file with 142 additions and 27 deletions.
169 changes: 142 additions & 27 deletions xcl/template-index.html
Original file line number Diff line number Diff line change
Expand Up @@ -80,57 +80,172 @@ <h1 id="ud-for-classical-armenian-">UD for Classical Armenian <span class="flags

<h2 id="tokenization-and-word-segmentation">Tokenization and Word Segmentation</h2>

<p>*</p>
<ul>
<li>Words are generally delimited by whitespace or punctuation. No tokens in the UD Classical Armenian treebank contains whitespace.</li>
<li>Most punctuation marks are attached to the preceding word and are tokenized as separate tokens.</li>
<li>Words, containing “infixed” punctuation (e.g. question, exclamation, emphasis and abbreviation marks), as <em>զիա՞րդ</em> = <em>զիարդ</em>/<em>ziard</em> + ՞ “why?”, are treated as multiword tokens and segmented to individual syntactic words.</li>
<li>According to typographical rules, the following words are attached to a neighbouring word:
<ul>
<li>proclitic prepositions <em>յ</em>=/<em>y</em>=, <em>ց</em>=/<em></em>= and <em>զ</em>=/<em>z</em>=</li>
<li>a proclic determinative particle <em>զ</em>=/<em>z</em>=</li>
<li>a proclic negation particle <em>չ</em>=/<em>čʽ</em>=</li>
<li>enclitic determinative particles =<em>ս</em>/=<em>s</em>, =<em>դ</em>/=<em>d</em>, =<em>ն</em>/=<em>n</em></li>
</ul>
</li>
</ul>

<hr />
<p><strong>Instruction</strong>: Describe the general rules for delimiting words (for example, based on whitespace and punctuation) and exceptions to these rules. Specify whether words with spaces and/or multiword tokens occur. Include links to further language-specific documentation if available.</p>
<h2 id="sentence-splitting">Sentence splitting</h2>

<hr />
<ul>
<li>A full sentence is usually concluded by the punctuation sign <em>verjaket</em> [ ։ ] corresponding to the English period. In case of longer sentences, the editor of a digital text may decide to split a sentence after the punctuation signs <em>mijaket</em> [ . ], <em>boot</em> [ ՝ ] or <em>storaket</em> [ , ], corresponding to the English colon, semicolon, and comma, respectively.</li>
</ul>

<h2 id="morphology">Morphology</h2>

<h3 id="tags">Tags</h3>

<p>*</p>
<p>This is an overview only.</p>

<hr />
<p><strong>Instruction</strong>: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.</p>
<ul>
<li>Classical Armenian currently uses 16 UPOS tags; the tag <a href="">SYM</a> does not occur in the UD_Classical_Armenian-CAVaL treebank.</li>
<li>The complete list of Classical Armenian words, which must be tagged <a href="">PART</a> in UD, has to be worked out. At present, the tag is used restrictively and is applied to four lexemes:
<ul>
<li>contrasting particle: <em>իսկ</em>/<em>isk</em></li>
<li>dubitation particle: <em>գուցէ</em>/<em>gowcʽē</em></li>
<li>negation particles: <em>ոչ</em>/<em>očʽ</em> (with its proclitic variant <em>չ</em>=/<em>čʽ</em>=) and <em>մի</em>/<em>mi</em></li>
</ul>
</li>
<li>The tag <a href="">DET</a> is used for articles, the determinate direct object proclitic <em>զ</em>=/<em>z</em>= (tradictionally called <em>nota accusativi</em>), and adjectival pronouns with a determiner function. Pronominal quantifiers (which the traditional grammar includes in pronouns) are <a href="">DET</a> as well. The tag <a href="">PRON</a> is reserved for pronouns occurring as the head of a noun phrase. When the proclitic <em>զ</em>=/<em>z</em>= is used with other cases than the accusative, it does not have a clear determiner function and is tagged <a href="">ADP</a> with the <a href="">case</a> relation.</li>
<li>The Classical Armenian auxiliaries (tagged <a href="">AUX</a>) include: <em>եմ</em>/<em>em</em> (“to be”), its perfective counterpart <em>լինիմ</em>/<em>linim</em> (“to become”), <em>չիք</em>/<em>čʽikʽ</em> (“there is no”), and <em>տամ</em> (“to give”).
The auxiliaries <em>եմ</em> and <em>լինիմ</em> are used in the following constructions:
<ul>
<li>The copula with non-verbal predicates, including predicates of location.</li>
<li>Periphrastic past tenses (present form of <em>եմ</em> + past participle, imperfect form of <em>եմ</em> + past participle, aorist form of <em>լինիմ</em> + past participle of the main verb).</li>
<li>Periphrastic future/subjunctive tenses (present subjunctive form of <em>եմ</em> + past participle, present subjunctive form of <em>լինիմ</em> + past participle, aorist subjunctive form of <em>լինիմ</em> + past participle of the main verb).
The auxiliary <em>չիք</em> is used as a negated copula.
The auxiliary <em>տամ</em> is used to form periphrastc causative:</li>
<li>Periphrastic causative (any form of <em>տամ</em>, including periphrastic forms, + infinitve of the main verb).</li>
</ul>
</li>
<li>Besides <em>եմ, լինիմ</em> and <em>տամ</em>, the verbs <em>կամ</em> (“to stand, exist”) and <em>ունիմ</em> (“to have”) occasionally function as auxiliaries.</li>
</ul>

<hr />
<h3 id="nominal-features">Nominal Features</h3>

<h3 id="features">Features</h3>
<ul>
<li><a href="">Number</a> has two values: <code class="language-plaintext highlighter-rouge">Sing</code> and <code class="language-plaintext highlighter-rouge">Plur</code>. The following parts of speech inflect for number: <a href="">NOUN</a>, <a href="">PROPN</a>, <a href="">PRON</a>, as well as the finite forms of <a href="">VERB</a> and <a href="">AUX</a>.
<ul>
<li>Classical Armenian has numerous <em>pluralia tantum</em> nouns, the plural form of which expresses a single entity or abstract notion, cf. <em>ապարանք</em>/<em>_aparankʽ</em> “palace”, <em>երեսք</em>/<em>ereskʽ</em> “face”, <em>բարիք</em>/<em>barikʽ</em> “goodness”, etc.</li>
</ul>
</li>
<li><a href="">Case</a> has seven values: <code class="language-plaintext highlighter-rouge">Nom</code>, <code class="language-plaintext highlighter-rouge">Acc</code>, <code class="language-plaintext highlighter-rouge">Gen</code>, <code class="language-plaintext highlighter-rouge">Dat</code>, <code class="language-plaintext highlighter-rouge">Abl</code>, <code class="language-plaintext highlighter-rouge">Ins</code>, <code class="language-plaintext highlighter-rouge">Loc</code>. It occurs with <a href="">NOUN</a>, <a href="">PROPN</a>, <a href="">NUM</a>, <a href="">PRON</a>, <a href="">DET</a>, <a href="">ADJ</a>, as well as with participles and verbal nouns, tagged <a href="">VERB</a> or <a href="">AUX</a>.</li>
<li><a href="">NumType</a> is used with numerals (<a href="">NUM</a>) and adjectives (<a href="">ADJ</a>)</li>
<li><a href="">Animacy</a> can be lexically expressed in <a href="">PRON</a>, while <a href="">Definite</a> can be lexically expressed in <a href="">PRON</a> and <a href="">DET</a>.</li>
</ul>

<p>*</p>
<h3 id="pronouns-determiners-quantifiers">Pronouns, Determiners, Quantifiers</h3>

<hr />
<p><strong>Instruction</strong>: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.</p>
<ul>
<li><a href="">PronType</a> is used with pronouns (<a href="">PRON</a>), determiners (<a href="">DET</a>), adverbs (<a href="">ADV</a>) and deictic interjections (<a href="">INTJ</a>).</li>
<li><a href="">Poss</a> marks possessive personal determiners (e.g. <em>իմ</em>/<em>im</em> “my”, <em>իւր</em>/<em>iwr</em> “his/her own”).</li>
<li><a href="">Reflex</a> marks reflexive pronoun <em>իւր</em>/<em>iwr</em> (gen.sg.) “of him/her-self” and determiner <em>իւր</em>/<em>iwr</em> (nom.sg.), <em>իւրոյ</em>/<em>iwroy</em> “his/her own”.</li>
<li><a href="">Person</a> is lexically expressed in personal pronouns (<a href="">PRON</a>). Only the first and second person pronouns are marked with the values <code class="language-plaintext highlighter-rouge">1</code> and <code class="language-plaintext highlighter-rouge">2</code>, respectively. The third person pronoun <em>նա</em>/<em>na</em> “(s)he, it” coincides with the demonstrative <em>նա</em>/<em>na</em> “that” and is left unmarked. The same applies to the possessive determiners.</li>
</ul>

<hr />
<h3 id="verbal-features">Verbal Features</h3>

<h2 id="syntax">Syntax</h2>
<ul>
<li><a href="">VerbForm</a> distinguishes five main (de)verbal forms. Although the verbal noun functions as a nominal and the past participle can be used adjectivally, they are consistently tagged <a href="">VERB</a> or <a href="">AUX</a>.
<ul>
<li>Finite verb <code class="language-plaintext highlighter-rouge">Fin</code>, tagged <a href="">VERB</a> or <a href="">AUX</a>.</li>
<li>Infinitive <code class="language-plaintext highlighter-rouge">Inf</code>, tagged <a href="">VERB</a> or <a href="">AUX</a>.</li>
<li>Converb <code class="language-plaintext highlighter-rouge">Conv</code>, tagged <a href="">VERB</a> or <a href="">AUX</a>.</li>
<li>Participle <code class="language-plaintext highlighter-rouge">Part</code>, tagged <a href="">VERB</a> or <a href="">AUX</a>.</li>
<li>Verbal noun <code class="language-plaintext highlighter-rouge">Vnoun</code>, tagged <a href="">VERB</a> or <a href="">AUX</a>.</li>
</ul>
</li>
<li><a href="">Person</a> has three values, which mark the person of the verb’s subject on verbs. Classical Armenian is a pro-drop language and a personal pronoun as subject is often omitted.</li>
<li><a href="">Aspect</a> has two values, <code class="language-plaintext highlighter-rouge">Imp</code> and <code class="language-plaintext highlighter-rouge">Perf</code>. The aspect is defined in purely morphological terms based on the type of the verb stem, from which a verb form is derived. The aspectual semantics expressed by either of the two types of forms may not match the formal aspect.</li>
<li>Finite verbs always have one of three values of <a href="">Mood</a>: <code class="language-plaintext highlighter-rouge">Ind</code>, <code class="language-plaintext highlighter-rouge">Sub</code>, or <code class="language-plaintext highlighter-rouge">Imp</code>.</li>
<li>In the indicative mood, verbs always have one of the two values of <a href="">Tense</a>: <code class="language-plaintext highlighter-rouge">Pres</code> or <code class="language-plaintext highlighter-rouge">Past</code>, which, in combination with the aforementioned aspectual values, define the three synthetic tenses, the Present, the Aorist, and the Imperfect.</li>
<li><code class="language-plaintext highlighter-rouge">Sub</code> defines the Subjunctve mood, which is also used to express the Future and combines with the two aspectual values.</li>
<li><code class="language-plaintext highlighter-rouge">Imp</code> defines the imperative, derived from a perfective stem, and the prohibitive, derived from an imperfective stem and obligatorily combined with a prohibitive particle <em>մի</em>/<em>mi</em>.</li>
<li><a href="">Voice</a> has two values, <code class="language-plaintext highlighter-rouge">Act</code> and <code class="language-plaintext highlighter-rouge">Pass</code>. It characterises the oppositional inflectional voice, which is expressed only in part of the verbal paradigm. Some forms, such as the present indicative forms of the a-conjugation (<em>գնամ</em>/<em>gnam</em> “I go”) and the first plural form of the aorist indicative (<em>լուաք</em>/<em>luakʽ</em> “we heard”), are underspecified for voice. The <code class="language-plaintext highlighter-rouge">Pass</code> value defines to a wide range of valency-decreasing alternations including the passive, middle, reflexive, etc. The morphological causative is a derivational category; derived causatives can be marked by the inflectional voice as <code class="language-plaintext highlighter-rouge">Act</code> or <code class="language-plaintext highlighter-rouge">Pass</code>, which makes the voice a layered feature in Classical Armenian. The causative layer is identified as <a href="">Voice[caus]</a> and invariably takes the value <code class="language-plaintext highlighter-rouge">Cau</code>.</li>
<li><a href="">Polarity</a> feature with its <code class="language-plaintext highlighter-rouge">Polarity=Neg</code> value applies primarily to verbs (<a href="">VERB</a>, <a href="">AUX</a>) that can be negated using <em>ոչ</em>/<em>očʽ</em> (with its proclitic variant <em>չ</em>=/<em>čʽ</em>=) or a prohibitive particle <em>մի</em>/<em>mi</em>. The particle <em>ոչ</em> can also modify pronouns.</li>
</ul>

<p>*</p>
<h3 id="other-features">Other Features</h3>

<hr />
<p><strong>Instruction</strong>: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.</p>
<ul>
<li>The following universal features are not used in Classical Armenian: <a href="">Clusivity</a>, <a href="">Evident</a>, <a href="">Gender</a>, <a href="">NounClass</a>, <a href="">Polite</a>.</li>
</ul>

<hr />
<h2 id="syntax">Syntax</h2>

<h2 id="treebanks">Treebanks</h2>
<p>This is an overview only.</p>

<p>There are <a href="../treebanks/xcl-comparison.html">N</a> Classical Armenian UD treebanks:</p>
<h3 id="core-arguments-oblique-arguments-and-adjuncts">Core Arguments, Oblique Arguments and Adjuncts</h3>

<ul>
<li><a href="../treebanks/xcl_a/index.html">Classical Armenian-A</a></li>
<li><a href="../treebanks/xcl_b/index.html">Classical Armenian-B</a></li>
<li>Nominal subject (<code class="language-plaintext highlighter-rouge">nsubj</code>) is a noun phrase (possibly headed by a deverbal nominal) typically in the nominative case, without preposition.
<ul>
<li>In the periphrastic past tenses, the subject of transitive verbs is typically coded by the genitive case.</li>
<li>Clausal subjects (<code class="language-plaintext highlighter-rouge">csubj</code>) are typically expressed by finate clauses, and clauses headed by infinitives or nonverbal predicates.</li>
</ul>
</li>
<li>Objects (<code class="language-plaintext highlighter-rouge">obj</code>) are noun phrases in the accusative, which can take the proclic determinate object marker <em>զ</em>=/<em>z</em>=.</li>
<li>Secondary objects (<code class="language-plaintext highlighter-rouge">iobj</code>) are expressed by bare noun phrases in the dative.</li>
<li>All other arguments and adjuncts are oblique <code class="language-plaintext highlighter-rouge">obl</code>. Arguments in the accusative that express spatial or temporal meanings are tagged as <code class="language-plaintext highlighter-rouge">obl</code> as well.</li>
<li>The infinitive complement is typically labeled <a href=""><code class="language-plaintext highlighter-rouge">xcomp</code></a>.</li>
<li>In passive clauses:
<ul>
<li>the subject is labeled either <a href=""><code class="language-plaintext highlighter-rouge">nsubj:pass</code></a> or <a href=""><code class="language-plaintext highlighter-rouge">csubj:pass</code></a>.</li>
<li>if the agent is present, it is expressed by an adpositional ablative noun phrase and is labeled <a href=""><code class="language-plaintext highlighter-rouge">obl:agent</code></a>.</li>
</ul>
</li>
<li>In causative clauses (both bare and periphrastic causative):
<ul>
<li>the subject is labeled with <a href=""><code class="language-plaintext highlighter-rouge">nsubj:caus</code></a>.</li>
<li>The auxiliary verb in periphrastic causative is labeled <a href=""><code class="language-plaintext highlighter-rouge">aux:caus</code></a>.</li>
</ul>
</li>
</ul>

<hr />
<p><strong>Instruction</strong>: Treebank-specific pages are generated automatically from the README file in the treebank repository and
from the data in the latest release. Link to the respective <code class="language-plaintext highlighter-rouge">*-index.html</code> page in the <code class="language-plaintext highlighter-rouge">treebanks</code> folder, using the language code
and the treebank code in the file name.</p>
<h3 id="non-verbal-clauses">Non-verbal Clauses</h3>
<ul>
<li>The copula is used in the following non-verbal clauses:
<ul>
<li>equational</li>
<li>attributional</li>
<li>locative</li>
<li>possessive</li>
<li>benefactory</li>
<li>existential</li>
</ul>
</li>
</ul>

<hr />
<h2 id="relations-overview">Relations Overview</h2>
<ul>
<li>The following relation subtypes are used in Classical Armenian:
<ul>
<li><a href=""><code class="language-plaintext highlighter-rouge">nsubj:pass</code></a> for nominal subjects of passive verbs</li>
<li><a href=""><code class="language-plaintext highlighter-rouge">nsubj:caus</code></a> for nominal subjects of causative verbs</li>
<li><a href=""><code class="language-plaintext highlighter-rouge">csubj:pass</code></a> for clausal subjects of passive verbs</li>
<li><a href=""><code class="language-plaintext highlighter-rouge">obl:agent</code></a> for agents of passive verbs</li>
<li><a href=""><code class="language-plaintext highlighter-rouge">aux:caus</code></a> for auxiliaries of periphrastic causatives</li>
<li><a href=""><code class="language-plaintext highlighter-rouge">acl:relcl</code></a> for relative clauses</li>
</ul>
</li>
</ul>

<h2 id="treebanks">Treebanks</h2>

<p>There is one Classical Armenian UD treebank:</p>

<ul>
<li><a href="../treebanks/xcl/index.html">UD_Classical_Armenian-CAVaL</a></li>
</ul>

</div>

Expand Down

0 comments on commit 08819c4

Please sign in to comment.