Tokenization and Word Segmentation
-*
+
+ - Words are generally delimited by whitespace or punctuation. No tokens in the UD Classical Armenian treebank contains whitespace.
+ - Most punctuation marks are attached to the preceding word and are tokenized as separate tokens.
+ - Words, containing “infixed” punctuation (e.g. question, exclamation, emphasis and abbreviation marks), as զիա՞րդ = զիարդ/ziard + ՞ “why?”, are treated as multiword tokens and segmented to individual syntactic words.
+ - According to typographical rules, the following words are attached to a neighbouring word:
+
+ - proclitic prepositions յ=/y=, ց=/cʽ= and զ=/z=
+ - a proclic determinative particle զ=/z=
+ - a proclic negation particle չ=/čʽ=
+ - enclitic determinative particles =ս/=s, =դ/=d, =ն/=n
+
+
+
-
-Instruction: Describe the general rules for delimiting words (for example, based on whitespace and punctuation) and exceptions to these rules. Specify whether words with spaces and/or multiword tokens occur. Include links to further language-specific documentation if available.
+Sentence splitting
-
+
+ - A full sentence is usually concluded by the punctuation sign verjaket [ ։ ] corresponding to the English period. In case of longer sentences, the editor of a digital text may decide to split a sentence after the punctuation signs mijaket [ . ], boot [ ՝ ] or storaket [ , ], corresponding to the English colon, semicolon, and comma, respectively.
+
Morphology
-*
+This is an overview only.
-
-Instruction: Specify any unused tags. Explain what words are tagged as PART. Describe how the AUX-VERB and DET-PRON distinctions are drawn, and specify whether there are (de)verbal forms tagged as ADJ, ADV or NOUN. Include links to language-specific tag definitions if any.
+
+ - Classical Armenian currently uses 16 UPOS tags; the tag SYM does not occur in the UD_Classical_Armenian-CAVaL treebank.
+ - The complete list of Classical Armenian words, which must be tagged PART in UD, has to be worked out. At present, the tag is used restrictively and is applied to four lexemes:
+
+ - contrasting particle: իսկ/isk
+ - dubitation particle: գուցէ/gowcʽē
+ - negation particles: ոչ/očʽ (with its proclitic variant չ=/čʽ=) and մի/mi
+
+
+ - The tag DET is used for articles, the determinate direct object proclitic զ=/z= (tradictionally called nota accusativi), and adjectival pronouns with a determiner function. Pronominal quantifiers (which the traditional grammar includes in pronouns) are DET as well. The tag PRON is reserved for pronouns occurring as the head of a noun phrase. When the proclitic զ=/z= is used with other cases than the accusative, it does not have a clear determiner function and is tagged ADP with the case relation.
+ - The Classical Armenian auxiliaries (tagged AUX) include: եմ/em (“to be”), its perfective counterpart լինիմ/linim (“to become”), չիք/čʽikʽ (“there is no”), and տամ (“to give”).
+The auxiliaries եմ and լինիմ are used in the following constructions:
+
+ - The copula with non-verbal predicates, including predicates of location.
+ - Periphrastic past tenses (present form of եմ + past participle, imperfect form of եմ + past participle, aorist form of լինիմ + past participle of the main verb).
+ - Periphrastic future/subjunctive tenses (present subjunctive form of եմ + past participle, present subjunctive form of լինիմ + past participle, aorist subjunctive form of լինիմ + past participle of the main verb).
+The auxiliary չիք is used as a negated copula.
+The auxiliary տամ is used to form periphrastc causative:
+ - Periphrastic causative (any form of տամ, including periphrastic forms, + infinitve of the main verb).
+
+
+ - Besides եմ, լինիմ and տամ, the verbs կամ (“to stand, exist”) and ունիմ (“to have”) occasionally function as auxiliaries.
+
-
+Nominal Features
-Features
+
+ - Number has two values:
Sing
and Plur
. The following parts of speech inflect for number: NOUN, PROPN, PRON, as well as the finite forms of VERB and AUX.
+
+ - Classical Armenian has numerous pluralia tantum nouns, the plural form of which expresses a single entity or abstract notion, cf. ապարանք/_aparankʽ “palace”, երեսք/ereskʽ “face”, բարիք/barikʽ “goodness”, etc.
+
+
+ - Case has seven values:
Nom
, Acc
, Gen
, Dat
, Abl
, Ins
, Loc
. It occurs with NOUN, PROPN, NUM, PRON, DET, ADJ, as well as with participles and verbal nouns, tagged VERB or AUX.
+ - NumType is used with numerals (NUM) and adjectives (ADJ)
+ - Animacy can be lexically expressed in PRON, while Definite can be lexically expressed in PRON and DET.
+
-*
+Pronouns, Determiners, Quantifiers
-
-Instruction: Describe inherent and inflectional features for major word classes (at least NOUN and VERB). Describe other noteworthy features. Include links to language-specific feature definitions if any.
+
+ - PronType is used with pronouns (PRON), determiners (DET), adverbs (ADV) and deictic interjections (INTJ).
+ - Poss marks possessive personal determiners (e.g. իմ/im “my”, իւր/iwr “his/her own”).
+ - Reflex marks reflexive pronoun իւր/iwr (gen.sg.) “of him/her-self” and determiner իւր/iwr (nom.sg.), իւրոյ/iwroy “his/her own”.
+ - Person is lexically expressed in personal pronouns (PRON). Only the first and second person pronouns are marked with the values
1
and 2
, respectively. The third person pronoun նա/na “(s)he, it” coincides with the demonstrative նա/na “that” and is left unmarked. The same applies to the possessive determiners.
+
-
+Verbal Features
-Syntax
+
+ - VerbForm distinguishes five main (de)verbal forms. Although the verbal noun functions as a nominal and the past participle can be used adjectivally, they are consistently tagged VERB or AUX.
+
+ - Finite verb
Fin
, tagged VERB or AUX.
+ - Infinitive
Inf
, tagged VERB or AUX.
+ - Converb
Conv
, tagged VERB or AUX.
+ - Participle
Part
, tagged VERB or AUX.
+ - Verbal noun
Vnoun
, tagged VERB or AUX.
+
+
+ - Person has three values, which mark the person of the verb’s subject on verbs. Classical Armenian is a pro-drop language and a personal pronoun as subject is often omitted.
+ - Aspect has two values,
Imp
and Perf
. The aspect is defined in purely morphological terms based on the type of the verb stem, from which a verb form is derived. The aspectual semantics expressed by either of the two types of forms may not match the formal aspect.
+ - Finite verbs always have one of three values of Mood:
Ind
, Sub
, or Imp
.
+ - In the indicative mood, verbs always have one of the two values of Tense:
Pres
or Past
, which, in combination with the aforementioned aspectual values, define the three synthetic tenses, the Present, the Aorist, and the Imperfect.
+ Sub
defines the Subjunctve mood, which is also used to express the Future and combines with the two aspectual values.
+ Imp
defines the imperative, derived from a perfective stem, and the prohibitive, derived from an imperfective stem and obligatorily combined with a prohibitive particle մի/mi.
+ - Voice has two values,
Act
and Pass
. It characterises the oppositional inflectional voice, which is expressed only in part of the verbal paradigm. Some forms, such as the present indicative forms of the a-conjugation (գնամ/gnam “I go”) and the first plural form of the aorist indicative (լուաք/luakʽ “we heard”), are underspecified for voice. The Pass
value defines to a wide range of valency-decreasing alternations including the passive, middle, reflexive, etc. The morphological causative is a derivational category; derived causatives can be marked by the inflectional voice as Act
or Pass
, which makes the voice a layered feature in Classical Armenian. The causative layer is identified as Voice[caus] and invariably takes the value Cau
.
+ - Polarity feature with its
Polarity=Neg
value applies primarily to verbs (VERB, AUX) that can be negated using ոչ/očʽ (with its proclitic variant չ=/čʽ=) or a prohibitive particle մի/mi. The particle ոչ can also modify pronouns.
+
-*
+Other Features
-
-Instruction: Give criteria for identifying core arguments (subjects and objects), and describe the range of copula constructions in nonverbal clauses. List all subtype relations used. Include links to language-specific relations definitions if any.
+
-
+Syntax
-Treebanks
+This is an overview only.
-There are N Classical Armenian UD treebanks:
+Core Arguments, Oblique Arguments and Adjuncts
- - Classical Armenian-A
- - Classical Armenian-B
+ - Nominal subject (
nsubj
) is a noun phrase (possibly headed by a deverbal nominal) typically in the nominative case, without preposition.
+
+ - In the periphrastic past tenses, the subject of transitive verbs is typically coded by the genitive case.
+ - Clausal subjects (
csubj
) are typically expressed by finate clauses, and clauses headed by infinitives or nonverbal predicates.
+
+
+ - Objects (
obj
) are noun phrases in the accusative, which can take the proclic determinate object marker զ=/z=.
+ - Secondary objects (
iobj
) are expressed by bare noun phrases in the dative.
+ - All other arguments and adjuncts are oblique
obl
. Arguments in the accusative that express spatial or temporal meanings are tagged as obl
as well.
+ - The infinitive complement is typically labeled
xcomp
.
+ - In passive clauses:
+
+ - the subject is labeled either
nsubj:pass
or csubj:pass
.
+ - if the agent is present, it is expressed by an adpositional ablative noun phrase and is labeled
obl:agent
.
+
+
+ - In causative clauses (both bare and periphrastic causative):
+
+ - the subject is labeled with
nsubj:caus
.
+ - The auxiliary verb in periphrastic causative is labeled
aux:caus
.
+
+
-
-Instruction: Treebank-specific pages are generated automatically from the README file in the treebank repository and
-from the data in the latest release. Link to the respective *-index.html
page in the treebanks
folder, using the language code
-and the treebank code in the file name.
+Non-verbal Clauses
+
+ - The copula is used in the following non-verbal clauses:
+
+ - equational
+ - attributional
+ - locative
+ - possessive
+ - benefactory
+ - existential
+
+
+
-
+Relations Overview
+
+ - The following relation subtypes are used in Classical Armenian:
+
+ nsubj:pass
for nominal subjects of passive verbs
+ nsubj:caus
for nominal subjects of causative verbs
+ csubj:pass
for clausal subjects of passive verbs
+ obl:agent
for agents of passive verbs
+ aux:caus
for auxiliaries of periphrastic causatives
+ acl:relcl
for relative clauses
+
+
+
+
+Treebanks
+
+There is one Classical Armenian UD treebank:
+
+