UMR assumes
that events are linked to frame files (also referred to as rolesets, might be viewed as composing a valency dictionary) that describe the participants of the event and their
semantic roles. The default source of English frame files is PropBank (version 3.4). Other
languages may use a similar resource if there is one, or create a lexicon on
the fly when working on UMR annotation.
The same is also valid for non-eventive concepts related to an event, see the next paragraph.
There is a minor terminological glitch: while all processes are events (even if nominalized, cf. event nominals), states and entities may or may not be events depending on how they are used in the sentence.
It would not make sense to identify a state with an entry in
a valency lexicon when it is used as an event (that is, in predication) and
to link it to a different entry in another lexicon when it is used in
modification or reference.
We will thus we assume that all processes and (some of) states have entries in a valency lexicon, i.e., their frames are available.
Entities can be in a separate lexicon. This also relates to anchoring of concepts:
entities are primarily anchored in Wikipedia (Wikidata), while states and
processes would ideally be anchored in the frame file (valency lexicon).
In addition, all non-eventive concepts related to an event (as, e.g., agentive nouns like teacher) are also linked to the respective frames (via the so-called reification, e.g., teacher is treated as ARG0-of
the teaching event and thus linked to the respective frame of the verb teach in the lexicon).
In the long run, we want to use SynSemClass to anchor processes and states in a cross-linguistically applicable manner. It currently contains only samples of verbs from a few languages, but it can be extended. At present it is not easy to identify a class for a verb (the interface lists the verb that was selected in each language as the label for the class, but it does not list the other verbs which have similar meaning and belong to the same class). A better search tool is now available, contains version SynSemClass 5.0 as it is stored in Lindat repository). For the latest version of the data (under development), see http://ufallab.ms.mff.cuni.cz/~fucikova/public_html/SSC_classmembers/.
In the meantime, for Czech (and especially for data from PDT) we can use the PDT-Vallex, searchable here or in Teitok here. There are verbs (both active and stative) but only a small number of other parts of speech denoting processes or states. We have MUST BE RREVISED conversion files that map the PDT-Vallex frames (column B) to strings that can be used as eventive concepts in UMR (column A). The concept strings are lemmas of the verbs (infinitives), always followed by a hyphen and a numeric index. This seems to be required for eventive concepts in UMR (although the guidelines do not say explicitly that it is needed). The examples in the guidelines use two-digit indexes (-01 for most predicates) but we use three-digit indexes because some Czech verbs have more than 99 frames. For the time being, we will use these strings as concepts. When the usage of SynSemClass is clarified in the future, it will be possible to automatically map them to SynSemClass. We will create concepts for states and processes that are not in PDT-Vallex (e.g. states expressed as adjectives) and take note of them so they can be later added to the lexicon.
Note that some words will be mapped to concepts that are not their lemmas. Participial adjectives will typically be mapped to verbal concepts. Verbal nouns will be mapped to corresponding verbal concepts. This holds also about some deverbal nouns that denote states or processes and are not derived using the standard -ní/-tí suffixes, such as dřímota “slumber”, objev “discovery”, ochrana “protection” etc.
- dodělávající “finishing” →
dodělávat-001
- dodělavší “having finished” →
dodělat-001
- dodělaný “finished” →
dodělat-001
- dušení “choking” →
dusit-se-001
- dřímota “slumber” →
dřímat-002
- válka “war” →
válčit-003
- jídlo “food” →
jíst-001
Argument roles is another name used for some of the relations under eventive
concepts. UMR inherits them from AMR,
which in turn follows OntoNotes (PropBank) conventions. There are six
argument roles: :ARG0
, :ARG1
, :ARG2
, :ARG3
, :ARG4
, :ARG5
. Their
exact meaning depends on each
verb, see the AMR final list of frames but there
are still some general tendencies of the correspondence between role number
and the semantic role. Hence not all frames will start with :ARG0
and use
the subsequent numbers in order.
:ARG0
... typically agent, experiencer. It often corresponds toACT
in PDT.:ARG1
... typically patient, theme. It often corresponds toPAT
in PDT.:ARG2
... typically recipient (besides:ARG2
, it could also use the relation:beneficiary
). It often corresponds toADDR
in PDT.
An example of a verb-specific (frame-specific) definition of roles:
- receive-01 ([cs] získat-001)
ARG0
: receiver;ARG1
: thing gotten;ARG2
: received from;ARG3
: price, in exchange for;ARG4
: attribute ofARG1
e.g., (The company).ARG0 hadn't yet received (any documents).ARG1 (from OSHA).ARG2 (regarding the penalty or fine).ARG4.
For some Czech verbs, their arguments have already been mapped onto ARGx roles - either within the SynSemClass project, or within CzEngVallex - the mapping can be found in the conversion files, column C (via CzEngVallex) and D (via CzEngVallex).
For verbs without a frame-specific mapping, the default conversion table will be used.
UMR proposes seven abstract concept predicates (plus two subclasses) for situations where states or entities are predicated (i.e., they are events), and, as they say, “there is no overt predicate-word”. They do not say what qualifies as an overt predicate word. The term non-verbal clauses seems to suggest that predicate words should be verbs. But they can hardly require it because verb is a part-of-speech category, and as they say in the beginning of part 3, “event identification is not based on parts of speech or word classes, since these vary greatly across languages.” Indeed, languages such as Chinese practically do not distinguish state verbs from adjectives.
Therefore, we should not take the word non-verbal too strictly. If we can create frame files for all processes and states (including states expressed primarily by adjectives), we can treat all these events as “verbal”.
On the other hand, entities are prototypically not used in predication, we will not have frames for them and they will be listed in a different lexicon than a valency lexicon. Nominal predicates where the noun denotes an entity may be treated as “non-verbal”.
The 7 (or 9) abstract predicates for non-verbal clauses are listed in Tables 3 and 4. See also Lists for UMR tools, sheet Abstract Rolesets (under non-prototypical pred rolesets):
- [cs] Vltava je řeka. “Vltava is a river.” ... predicational
have-role-91
- [cs] Tato řeka je Vltava. “This river is Vltava.” ... equational
identity-91
Strictly speaking, such sentences are not completely non-verbal in Czech or
English because they have a verbal copula. In Czech, the copula být
corresponds to frame být-007
(v-w243f80_ZU substituted with v-w243f187_MM).
But in Polish the copula is not verbal, and in Russian there is no copula in
the present tense at all.
- [pl] Wełtawa to rzeka. “Vltava is a river.”
- [pl] Ta rzeka to Wełtawa. “This river is Vltava.”
- [ru] Влтава — река. “Vltava is a river.”
- [ru] Эта река — Влтава. “This river is Vltava.”
To ensure cross-linguistically more consistent treatment of such sentences,
the abstract predicates have-role-91
, resp. identity-91
, are used in all
languages, regardless whether a copula is used. That is shown in the
guidelines since part
1
in examples like (2) Pope is the American businessman who...
(identity-91
), or 3-1-2 (1) Edmond Pope is an American businessman.
(have-role-91
). The distinction between identity and having role is another
advantage of the abstract predicates: the copula in Czech and English is the
same in both situations. It could be distinguished by different frames, but
for example the Czech valency lexicon (PDT-Vallex) does not distinguish them
and uses být-007
for both of them.
(h/ have-role-91
:ARG1 (r/ river
:wiki "Q131574"
:name (n/ name :op1 "Vltava"))
:ARG3 (ř/ řeka
:wiki "Q4022"))
(h/ have-role-91
:ARG1 (r/ river
:wiki "Q131574"
:name (n/ name :op1 "Wełtawa"))
:ARG3 (r2/ rzeka
:wiki "Q4022"))
(h/ have-role-91
:ARG1 (r/ river
:wiki "Q131574"
:name (n/ name :op1 "Влтава"))
:ARG3 (р/ река
:wiki "Q4022"))
UMR works with a list of so called abstract predicates (each of which has their semantic roles). These predicates are used in annotation to ensure cross-linguistically more consistent treatment of specific constructions.
Although these abstract predicates are not systematically listed in the Guidelines, we can work with the Lists for UMR tools, sheet Abstract Rolesets.
There are 4 types abstract rolesets specified in the above lists, serving for:
- non-prototypical pred rolesets (= rolesets for non-verbal clauses), see above;
- implicite roles for specific syntactic constructions;
- reification, and
- discourse relations.
Examples:
-
[en] "It was like mud running down the mountain and it covered the village in seconds," she said, quoting survivors. (english_umr-0001.txt)
... withresemble-91
relation with 2 roles,ARG1
(for copy, here "thing" as an abstract concept) andARG2
(original, here mud). -
[en] Military helicopters were able to reach the area despite heavy clouds but the flights ceased after nightfall because the aircraft did not have night - flying capabilities. (english_umr-0001.txt)
-
.. where the abstract predicate
weather-91
is used to annotate heavy clouds. -
[en] The more I read your stuff, the more I am convinced that you have a black heart.
... where the abstract predicatecorrelate-91
is used to annotate the "the X-er, the Y-er" construction andhave-degree-91
to annotate the comparative construction.
(c2 / correlate-91
:aspect Habitual
:ARG1 (m / more
:frequency-of (r / read-01
:ARG0 (p / person
:ref-person 1st
:ref-number Singular)
:ARG1 (s2 / stuff
:poss (p2 / person
:ref-person 2nd
:ref-number Singular))))
:ARG2 (m2 / more
:ARG3-of (h3 / have-degree-91
:ARG1 0
:ARG2 (c / convince-01
:ARG1 p
:ARG2 (h / have-03
:ARG0 p2
:ARG1 (h2 / heart
:ARG1-of (b / black-06)))))))
Further, abstract predicates are used for so-called reification, i.e., converting a role into a concept -- e.g., the relation :cause
might be replaced by cause-01
; instead of x :cause y
, we have x :ARG1-of (c / cause-01 :ARG0 y)
(AMR quidelines).
The AMR Guidelines provides the following example:
- [en] The torpedo struck, causing the ship to be damaged. / The torpedo struck, causing damage to the ship. / The torpedo struck, damaging the ship.
(s / strike-01
:ARG0 (t / torpedo)
:cause-of (d / damage-01
:ARG1 (s2 / ship)))
- [en] The girl left because the boy arrived.
AMR without reification: AMR with reification:
(l / leave (l / leave
:ARG0 (g / girl) :ARG0 (g / girl)
:cause (a / arrive :ARG1-of (c / cause-01
:ARG0 (b / boy))) :ARG0 (a / arrive
:ARG0 (b / boy))))
These rolesets when multiple events are expressed in a complex sentence (combines also with reification).