diff --git a/doc/entities.md b/doc/entities.md
index 3f5405d..eb3da3e 100644
--- a/doc/entities.md
+++ b/doc/entities.md
@@ -1,129 +1,123 @@
# Entities
-**Entities** constitute one of the three main types of concepts in UMR,
-alongside **states** and **processes.** They typically correspond to physical
-objects, but there are also abstract entities such as _soul._
-
-A _mention_ of an entity in text can be **specific** or **generic.** A
-specific mention refers to a concrete, unique instance of the entity. Not
-necessarily by its name (meaning that not every specific entity is a named
-entity), but there is one specific instance that the speaker has in mind.
-
-In (1a), there are three mentions of the same specific entity, the unique
-institution in Prague, called “National Museum”. It is first referred to by
-its name, then by the personal pronoun _mu_, and finally by the common noun
-_muzeum_. In contrast, the common noun _muzea_ (plural of _muzeum_) in (1b)
-is a generic entity: It refers to various institutions that all belong to the
-category of museums.
+**Entities** constitute one of the three main types of concepts in UMR,
+alongside **states** and **processes.** They typically correspond to physical
+objects, but there are also abstract entities such as _soul._
+
+A _mention_ of an entity in text can be **specific** or **generic.** A
+specific mention refers to a concrete, unique instance of the entity. Not
+necessarily by its name (meaning that not every specific entity is a named
+entity), but there is one specific instance that the speaker has in mind.
+
+In (1a), there are three mentions of the same specific entity, the unique
+institution in Prague, called “National Museum”. It is first referred to by
+its name, then by the personal pronoun _mu_, and finally by the common noun
+_muzeum_. In contrast, the common noun _muzea_ (plural of _muzeum_) in (1b)
+is a generic entity: It refers to various institutions that all belong to the
+category of museums.
* (1a) [cs] _Národní muzeum v Praze získá nový bezpečnostní systém, který mu dodá firma CESS. Muzeum za něj zaplatí necelé 2 milióny korun._
“The National Museum in Prague will get a new security system, which will be supplied by CESS. The museum will pay almost 2 million crowns for it.”
* (1b) [cs] _V každé zemi podléhají muzea jiné legislativě._
“In each country, museums are subject to different legislation.”
-If a proper name is used, it typically refers to a specific entity, but as we
-see in (1a), specific entities can be referenced by other means, too. Even if
-the name were not present in the sentence, the context would tell us that we
-are talking about one specific museum, which probably has a name, and perhaps
-the context would be specific enough to allow us to identify the entity and
-its name in the real world. However, that is not a necessary condition for a
-specific entity. In (2a), _staršího muže_ “an elderly man” refers to a person
-whom we do not know and who may not even exist in the real world (the text
-may be a work of fiction). The man may not be mentioned again and we may not
-learn anything else about him, yet in this local context he is a specific
-entity and not a generic one.
+If a proper name is used, it typically refers to a specific entity, but as we
+see in (1a), specific entities can be referenced by other means, too. Even if
+the name were not present in the sentence, the context would tell us that we
+are talking about one specific museum, which probably has a name, and perhaps
+the context would be specific enough to allow us to identify the entity and
+its name in the real world. However, that is not a necessary condition for a
+specific entity. In (2a), _staršího muže_ “an elderly man” refers to a person
+whom we do not know and who may not even exist in the real world (the text
+may be a work of fiction). The man may not be mentioned again and we may not
+learn anything else about him, yet in this local context he is a specific
+entity and not a generic one.
* (2) [cs] _Když opouštěl budovu, zahlédl staršího muže, jenž nesl v náručí žlutou krabici._
“As he was leaving the building, he saw an elderly man carrying a yellow box in his arms.”
-A specific entity is a **named entity** if it is referenced by its **name.**
-Name is a word or a sequence of words whose purpose is to label a specific
-instance, not to describe a category of entities by their properties or
-relations to other entities. Thus _muzeum_ is not a name because it can be
-used to refer to any institution that meets certain parameters. The phrase
-_Národní muzeum_ is a name because it was specifically designed to label one
-particular museum.
-
-> **proper name** ?= named entity (as "New Zealand" and "the United States of America")
-BUT "Zealand" or "America" are not proper names - why not?
-(confusing as Zealand is a Danish island and America may refer to the continent)
-
-> **proper noun** ?= common noun (as "museum")
-
-The name does not have to be unique: An important museum
-in another country may also be called _Národní muzeum_, just like there are
-multiple people called _John Smith_. People may have to add more information
-if misinterpretation is possible, but the intended purpose of a name is to
-give the entity a reasonably locally unique identifier, and the purpose is
-what matters.
-
-Furthermore, it is not necessary that a specific entity has only one name.
-For example, _Spojené státy americké_, _Spojené státy_, _USA_ and _Amerika_
-are all names and all refer to the same country. One can even encounter just
-_Státy_ used as a name and referring to the USA. When used in a Czech
-sentence in this manner, it cannot refer to, e.g., the United States of
-Mexico. The same word, _státy_ “states” (not capitalized, unless
-sentence-initial), can be used as a common noun (hence not a name), referring
-to a group of entities (states or countries) that may be specific or generic.
-On the other hand, depending on context, _Amerika_ may refer to a continent
-rather than to a country (North America, South America), or it may refer to a
-quarry southwest of Prague.
-
-Proper names are thus designed to label specific instances, while common
-nouns are meant to describe broader categories (types). The borderline may be
-occasionally blurry when a common noun is repurposed as a name (as we have
-seen with _Státy_ above) but it is much less likely that a proper name will
-be used for a generic entity. We can certainly define a category of all
-people named _Václav_, as in (3), but that does not convert the name into a
-common noun – all these people first got that name with the hope that it will
-make them identifiable and distinguishable from other people, and only later
-the speaker artificially grouped them, using their name as the property
-defining the group.
+A specific entity is a **named entity** if it is referenced by its **name.**
+Name is a word or a sequence of words whose purpose is to label a specific
+instance, not to describe a category of entities by their properties or
+relations to other entities. Thus _muzeum_ is not a name because it can be
+used to refer to any institution that meets certain parameters. The phrase
+_Národní muzeum_ is a name because it was specifically designed to label one
+particular museum.
+
+The name does not have to be unique: An important museum
+in another country may also be called _Národní muzeum_, just like there are
+multiple people called _John Smith_. People may have to add more information
+if misinterpretation is possible, but the intended purpose of a name is to
+give the entity a reasonably locally unique identifier, and the purpose is
+what matters.
+
+Furthermore, it is not necessary that a specific entity has only one name.
+For example, _Spojené státy americké_, _Spojené státy_, _USA_ and _Amerika_
+are all names and all refer to the same country. One can even encounter just
+_Státy_ used as a name and referring to the USA. When used in a Czech
+sentence in this manner, it cannot refer to, e.g., the United States of
+Mexico. The same word, _státy_ “states” (not capitalized, unless
+sentence-initial), can be used as a common noun (hence not a name), referring
+to a group of entities (states or countries) that may be specific or generic.
+On the other hand, depending on context, _Amerika_ may refer to a continent
+rather than to a country (North America, South America), or it may refer to a
+quarry southwest of Prague.
+
+Proper names are thus designed to label specific instances, while common
+nouns are meant to describe broader categories (types). The borderline may be
+occasionally blurry when a common noun is repurposed as a name (as we have
+seen with _Státy_ above) but it is much less likely that a proper name will
+be used for a generic entity. We can certainly define a category of all
+people named _Václav_, as in (3), but that does not convert the name into a
+common noun – all these people first got that name with the hope that it will
+make them identifiable and distinguishable from other people, and only later
+the speaker artificially grouped them, using their name as the property
+defining the group.
* (3) [cs] _Všichni Václavové by měli znát své slavné jmenovce._
“All Václavs should know their famous namesakes.”
-While the use of _Václavové_ in (3) is unusual, there are proper names that
-denote a type rather than an instance. A primary example is product names, as
-in (4) (the specification of product category is enclosed in parentheses in
-the example because it is optional):
+While the use of _Václavové_ in (3) is unusual, there are proper names that
+denote a type rather than an instance. A primary example is product names, as
+in (4) (the specification of product category is enclosed in parentheses in
+the example because it is optional):
* (4) [cs] _Používám (prací prostředek) Persil._
“I use Persil (detergent).”
-Clearly, _Persil_ is a proper name rather than a common noun, as it was
-invented specifically to distinguish this detergent from other detergents; it
-is not a common noun that we expect to find in dictionaries. However, the
-name denotes a type of product, not one particular instance. There are
-millions of packages of Persil, and they all share this name. And while the
-name could be used when referring to a specific package, in (4) it actually
-refers to a generic entity. We will use the term **categorial proper names /
-categorial named entities** with names that denote types (categories) rather
-than instances.
+Clearly, _Persil_ is a proper name rather than a common noun, as it was
+invented specifically to distinguish this detergent from other detergents; it
+is not a common noun that we expect to find in dictionaries. However, the
+name denotes a type of product, not one particular instance. There are
+millions of packages of Persil, and they all share this name. And while the
+name could be used when referring to a specific package, in (4) it actually
+refers to a generic entity. We will use the term **categorial proper names /
+categorial named entities** with names that denote types (categories) rather
+than instances.
## Representation of entities in UMR
-An entity that is referred to by a **common noun** is represented by a
-regular concept (node), typically with the lemma of the noun as the label of
-the concept (but occasionally the label may be a multi-word string). This is
-done no matter if the entity is specific or generic.
-
-An entity that is referred to by a **name** is represented by an abstract
-concept corresponding to the semantic class of the entity (e.g., “person” or
-“organization”; see below for the taxonomy of semantic classes). The name of
-the entity is in a separate node, which has the abstract concept “name” and
-is attached to the class concept via the relation `:name`. Individual
-orthographic words of the name are listed in the name concept each in its own
-attribute, the attributes are named `:opX` where X is the ordinal number of
-the word within the name. The words are not always exact copies from the
-sentence, as the name is converted to its canonical form. Note however that
-this does not mean that all words in the name are replaced by their lemmas;
-some will be lemmatized, others will stay in a frozen inflected form.
-
-An entity that is referred to by a **pronoun** is represented by an abstract
-concept corresponding to the semantic class of the entity. Unlike named
-entities, there is no child node with the “name” concept.
+An entity that is referred to by a **common noun** is represented by a
+regular concept (node), typically with the lemma of the noun as the label of
+the concept (but occasionally the label may be a multi-word string). This is
+done no matter if the entity is specific or generic.
+
+An entity that is referred to by a **name** is represented by an abstract
+concept corresponding to the semantic class of the entity (e.g., “person” or
+“organization”; see below for the taxonomy of semantic classes). The name of
+the entity is in a separate node, which has the abstract concept “name” and
+is attached to the class concept via the relation `:name`. Individual
+orthographic words of the name are listed in the name concept each in its own
+attribute, the attributes are named `:opX` where X is the ordinal number of
+the word within the name. The words are not always exact copies from the
+sentence, as the name is converted to its canonical form. Note however that
+this does not mean that all words in the name are replaced by their lemmas;
+some will be lemmatized, others will stay in a frozen inflected form.
+
+An entity that is referred to by a **pronoun** is represented by an abstract
+concept corresponding to the semantic class of the entity. Unlike named
+entities, there is no child node with the “name” concept.
Common noun _muzeum_ “museum”:
```
@@ -152,13 +146,13 @@ Named entity _Národní muzeum_ “National Museum”:
:op6 "tělovýchovy"))
```
-Note that the canonical form of the multi-word name of the ministry in (5) is
-composed of the canonical form of the head (_Ministerstvu_ was converted to
-nominative singular, but its capitalization was retained) and the inflected
-forms of the dependent words; the comma is also a separate `:opX` attribute.
+Note that the canonical form of the multi-word name of the ministry in (5) is
+composed of the canonical form of the head (_Ministerstvu_ was converted to
+nominative singular, but its capitalization was retained) and the inflected
+forms of the dependent words; the comma is also a separate `:opX` attribute.
-With categorial named entities, the `:name` relation can occur even with a
-generic entity:
+With categorial named entities, the `:name` relation can occur even with a
+generic entity:
```
(p/ product
@@ -169,30 +163,30 @@ generic entity:
## Anchoring entities in ontologies
-UMR defines the (optional) `:wiki` attribute, which can be used to link a
-concept to a corresponding article in Wikipedia. The examples in the UMR
-guidelines currently show names of English Wikipedia articles in these
-attributes; however, a more robust and thus preferred solution is to use
-Wikidata identifiers. They are not bound to a particular language mutation of
-Wikipedia (all Wikipedias that have an article about the concept are linked
-from the Wikidata page) and they should be more stable (e.g. when one of the
-Wikipedias decides that a different title should be used for the article and
-the old title should become a redirect). Obtaining Wikidata identifiers is
-easy: Let's assume we want to anchor the Czech entity _Národní muzeum_ and we
-find its article in the Czech Wikipedia at
-[https://cs.wikipedia.org/wiki/N%C3%A1rodn%C3%AD_muzeum](https://cs.wikipedia.org/wiki/N%C3%A1rodn%C3%AD_muzeum).
-In the menu on the right-hand side we see a link labeled “Položka Wikidat”
-and leading to
-[https://www.wikidata.org/wiki/Q188112](https://www.wikidata.org/wiki/Q188112).
-
-Although the attribute is optional in UMR, in our data we should strive to
-provide it for every mention of a specific entity that has a Wikidata entry.
-(In practice, we could use the coreference annotation in UMR to automatically
-propagate the anchor from one mention to all other mentions. Note however
-that it would be a mistake to say that we only fill the attribute manually
-for named entities. It can happen that a specific entity is never mentioned
-by its name in a document, yet the context doubtlessly points to a known
-entity described in Wikipedia.)
+UMR defines the (optional) `:wiki` attribute, which can be used to link a
+concept to a corresponding article in Wikipedia. The examples in the UMR
+guidelines currently show names of English Wikipedia articles in these
+attributes; however, a more robust and thus preferred solution is to use
+Wikidata identifiers. They are not bound to a particular language mutation of
+Wikipedia (all Wikipedias that have an article about the concept are linked
+from the Wikidata page) and they should be more stable (e.g. when one of the
+Wikipedias decides that a different title should be used for the article and
+the old title should become a redirect). Obtaining Wikidata identifiers is
+easy: Let's assume we want to anchor the Czech entity _Národní muzeum_ and we
+find its article in the Czech Wikipedia at
+[https://cs.wikipedia.org/wiki/N%C3%A1rodn%C3%AD_muzeum](https://cs.wikipedia.org/wiki/N%C3%A1rodn%C3%AD_muzeum).
+In the menu on the right-hand side we see a link labeled “Položka Wikidat”
+and leading to
+[https://www.wikidata.org/wiki/Q188112](https://www.wikidata.org/wiki/Q188112).
+
+Although the attribute is optional in UMR, in our data we should strive to
+provide it for every mention of a specific entity that has a Wikidata entry.
+(In practice, we could use the coreference annotation in UMR to automatically
+propagate the anchor from one mention to all other mentions. Note however
+that it would be a mistake to say that we only fill the attribute manually
+for named entities. It can happen that a specific entity is never mentioned
+by its name in a document, yet the context doubtlessly points to a known
+entity described in Wikipedia.)
```
(o/ organization
@@ -202,44 +196,44 @@ entity described in Wikipedia.)
:op2 "muzeum"))
```
-If a specific entity has no Wikidata presence, we have to register it in a
-local ontology that becomes part of the annotation, and provide a local
-identifier instead. Note that the entries in the local ontology are not
-always local to just one document. They are still part of the same universe
-that is partially described in Wikipedia. Consider, for example, a news
-article reporting that _A man (80) was killed this morning in a traffic
-accident._ There could be several other documents reporting on the same
-event, and if it can be established that they are indeed talking about the
-same accident, then all the mentions of the nameless man should be anchored
-to the same entry in the ontology. TODO: Implement a
-prototype of the local ontology and specify how it should be linked from the
-annotation. We should probably use a different attribute, e.g.,
-`:lwiki`.
+If a specific entity has no Wikidata presence, we have to register it in a
+local ontology that becomes part of the annotation, and provide a local
+identifier instead. Note that the entries in the local ontology are not
+always local to just one document. They are still part of the same universe
+that is partially described in Wikipedia. Consider, for example, a news
+article reporting that _A man (80) was killed this morning in a traffic
+accident._ There could be several other documents reporting on the same
+event, and if it can be established that they are indeed talking about the
+same accident, then all the mentions of the nameless man should be anchored
+to the same entry in the ontology. TODO: Implement a
+prototype of the local ontology and specify how it should be linked from the
+annotation. We should probably use a different attribute, e.g.,
+`:lwiki`.
## Other attributes of entities
-Every entity concept should have the attribute `:ref-number`, with the value
-reflecting the grammatical number. UMR defines a number of possible values
-for the attribute, based on grammars of various languages. For Modern Czech
-data the value will probably (almost?) always be `Singular` or `Plural`. We
-will not use `Dual` just because we know that we are speaking about two
-people. The dual as a grammatical number has mostly vanished from Czech, and
-UMR has other means to annotate quantity (there is not a separate value of
-`:ref-number` for each integer number). A possible exception in Modern Czech
-is paired body parts _(nohy, ruce, oči, uši)_ because that is where
-grammatical dual still survives.
-
-Abstract entity concepts that correspond to personal pronouns (or their
-reflections in verbal morphology) will additionally have the attribute
-`:ref-person`. This attribute is not used with other entity mentions (for
-which it would be very unusual to interpret them as anything else than 3rd
-person).
-
-We now repeat example (1a) as (6) here and show the annotations of entities
-from the example, using all the rules specified so far. (We omit the monetary
-entity from the end because such types of entities have not been discussed
-yet.)
+Every entity concept should have the attribute `:ref-number`, with the value
+reflecting the grammatical number. UMR defines a number of possible values
+for the attribute, based on grammars of various languages. For Modern Czech
+data the value will probably (almost?) always be `Singular` or `Plural`. We
+will not use `Dual` just because we know that we are speaking about two
+people. The dual as a grammatical number has mostly vanished from Czech, and
+UMR has other means to annotate quantity (there is not a separate value of
+`:ref-number` for each integer number). A possible exception in Modern Czech
+is paired body parts _(nohy, ruce, oči, uši)_ because that is where
+grammatical dual still survives.
+
+Abstract entity concepts that correspond to personal pronouns (or their
+reflections in verbal morphology) will additionally have the attribute
+`:ref-person`. This attribute is not used with other entity mentions (for
+which it would be very unusual to interpret them as anything else than 3rd
+person).
+
+We now repeat example (1a) as (6) here and show the annotations of entities
+from the example, using all the rules specified so far. (We omit the monetary
+entity from the end because such types of entities have not been discussed
+yet.)
* (6) [cs] _Národní muzeum v Praze získá nový bezpečnostní systém, který mu dodá firma CESS. Muzeum za něj zaplatí necelé 2 milióny korun._
“The National Museum in Prague will get a new security system, which will be supplied by CESS. The museum will pay almost 2 million crowns for it.”
@@ -286,14 +280,14 @@ yet.)
## Taxonomy of entity types
-The [UMR
-guidelines](https://github.com/umr4nlp/umr-guidelines/blob/master/guidelines.md)
-give a taxonomy of entity classes, types and subtypes in [Section
-3-1-2](https://github.com/umr4nlp/umr-guidelines/blob/master/guidelines.md#part-3-1-2-named-entities).
-They can be used as abstract concepts for named entities and for entities
-represented by pronouns. As of now (June 2023), the taxonomy is reportedly
-under revision by the UMR team. At any rate, the current table has a number
-of issues. Below we examine some of the entity types and discuss their
+The [UMR
+guidelines](https://github.com/umr4nlp/umr-guidelines/blob/master/guidelines.md)
+give a taxonomy of entity classes, types and subtypes in [Section
+3-1-2](https://github.com/umr4nlp/umr-guidelines/blob/master/guidelines.md#part-3-1-2-named-entities).
+They can be used as abstract concepts for named entities and for entities
+represented by pronouns. As of now (June 2023), the taxonomy is reportedly
+under revision by the UMR team. At any rate, the current table has a number
+of issues. Below we examine some of the entity types and discuss their
utility.
Some other resources:
@@ -307,18 +301,18 @@ Some other resources:
### person
-A top-level class without subordinate types and subtypes. Besides humans, the
-class could serve as a natural fallback for human-like beings that do not
-have a class of their own: deities, dwarves, hobbits, elves etc. What about
-robots?
+A top-level class without subordinate types and subtypes. Besides humans, the
+class could serve as a natural fallback for human-like beings that do not
+have a class of their own: deities, dwarves, hobbits, elves etc. What about
+robots?
### animal
-A top-level class. As a named entity, it can be used to represent a pet that
-was given a name by its owner, or an animal character in a fairy tale,
-including fantastic beasts like dragons. The class is not suitable to
-represent _species_ of animals. (They have their own type in the taxonomy but
-it is problematic, see below.)
+A top-level class. As a named entity, it can be used to represent a pet that
+was given a name by its owner, or an animal character in a fairy tale,
+including fantastic beasts like dragons. The class is not suitable to
+represent _species_ of animals. (They have their own type in the taxonomy but
+it is problematic, see below.)
### plant
@@ -326,264 +320,264 @@ Analogous to animals but covering plants.
### thing
-Not listed in the current UMR table but used in their examples (e.g. sentence
-(3) in Part 1) and clearly needed at least for pronouns that refer neither to
-persons nor to animals or plants or other types specifically listed in the
-taxonomy.
+Not listed in the current UMR table but used in their examples (e.g. sentence
+(3) in Part 1) and clearly needed at least for pronouns that refer neither to
+persons nor to animals or plants or other types specifically listed in the
+taxonomy.
### geographic-entity
-A subset of what other named entity taxonomies often label as “location”.
-This subset contains only phenomena created by nature, not by mankind. The
-UMR guidelines currently provide 15 types belonging to this class, probably
-not exhaustive and to be extended in the future. The annotators should use
-the types as abstract concepts if they know them, otherwise they can fall
-back to the whole class.
+A subset of what other named entity taxonomies often label as “location”.
+This subset contains only phenomena created by nature, not by mankind. The
+UMR guidelines currently provide 15 types belonging to this class, probably
+not exhaustive and to be extended in the future. The annotators should use
+the types as abstract concepts if they know them, otherwise they can fall
+back to the whole class.
-The types are: `ocean`, `sea`, `lake`, `river`, `gulf`, `bay`, `strait`,
-`island`, `peninsula`, `mountain`, `volcano`, `valley`, `canyon`, `desert`,
-`forest`.
+The types are: `ocean`, `sea`, `lake`, `river`, `gulf`, `bay`, `strait`,
+`island`, `peninsula`, `mountain`, `volcano`, `valley`, `canyon`, `desert`,
+`forest`.
### celestial-body
-Like `geographic-entity` but on a cosmic scale. Currently four types: `moon`,
-`planet`, `star`, `constellation`. Again not exhaustive: What do we do with
-objects that are smaller than planets but are not moons?
+Like `geographic-entity` but on a cosmic scale. Currently four types: `moon`,
+`planet`, `star`, `constellation`. Again not exhaustive: What do we do with
+objects that are smaller than planets but are not moons?
-Note that for _Měsíc_ “Moon”, _Země_ “Earth” and _Slunce_ “Sun”, there is a
-blurred context-based borderline between a name of a celestial body and a
-common noun. But at least the Czech grammar puts the burden of decision on
-the shoulders of the author: if it is a name of a celestial body, it has to
-be written capitalized. So, unless it is the first word of a sentence, the
-annotator can take capitalization as the cue.
+Note that for _Měsíc_ “Moon”, _Země_ “Earth” and _Slunce_ “Sun”, there is a
+blurred context-based borderline between a name of a celestial body and a
+common noun. But at least the Czech grammar puts the burden of decision on
+the shoulders of the author: if it is a name of a celestial body, it has to
+be written capitalized. So, unless it is the first word of a sentence, the
+annotator can take capitalization as the cue.
### geo-political-entity
-A subset of what other named entity taxonomies often label as “location”.
-This subset contains only phenomena created by mankind, not by nature. The
-UMR guidelines currently provide 7 types belonging to this class, probably
-not exhaustive and to be extended in the future. The annotators should use
-the types as abstract concepts if they know them, otherwise they can fall
-back to the whole class.
+A subset of what other named entity taxonomies often label as “location”.
+This subset contains only phenomena created by mankind, not by nature. The
+UMR guidelines currently provide 7 types belonging to this class, probably
+not exhaustive and to be extended in the future. The annotators should use
+the types as abstract concepts if they know them, otherwise they can fall
+back to the whole class.
-The types are: `country`, `state`, `province`, `territory`, `county`, `city`,
-`city-district`.
+The types are: `country`, `state`, `province`, `territory`, `county`, `city`,
+`city-district`.
-The current selection is too much focused on North America. For example, the
-Czech sub-country administrative unit, _kraj_ “region”, is neither a state,
-nor a province, territory or county. Czech _okres_ could probably be
-annotated as `county`. See also the `region` class below.
+The current selection is too much focused on North America. For example, the
+Czech sub-country administrative unit, _kraj_ “region”, is neither a state,
+nor a province, territory or county. Czech _okres_ could probably be
+annotated as `county`. See also the `region` class below.
### region
-A class with three types: `world-region`, `country-region`, `local-region`.
-There is no definition and it is difficult to guess what the authors had in
-mind. But it is not in the `geo-political-entity` class, so it is probably
-not meant as an official administrative unit and it does not solve the
-problem with Czech _kraj_. Maybe it is meant for less official or formal
-regions such as _Valašsko_, _Morava_ or _Evropa_. Still, we need criteria to
-decide between the three types of regions.
+A class with three types: `world-region`, `country-region`, `local-region`.
+There is no definition and it is difficult to guess what the authors had in
+mind. But it is not in the `geo-political-entity` class, so it is probably
+not meant as an official administrative unit and it does not solve the
+problem with Czech _kraj_. Maybe it is meant for less official or formal
+regions such as _Valašsko_, _Morava_ or _Evropa_. Still, we need criteria to
+decide between the three types of regions.
### facility
-A class of man-made entities that have a fixed location but the name does not
-pertain just to the location but also to the building (or whatever other
-facility it is). In some cases, a facility is also an organization (example:
-_museum_), which is a different entity class, but they may share a name. Then
-the annotator will have to decide by context whether the utterance is more
-about the legal entity (organization), or the place or building (facility).
+A class of man-made entities that have a fixed location but the name does not
+pertain just to the location but also to the building (or whatever other
+facility it is). In some cases, a facility is also an organization (example:
+_museum_), which is a different entity class, but they may share a name. Then
+the annotator will have to decide by context whether the utterance is more
+about the legal entity (organization), or the place or building (facility).
-The guidelines currently list 19 types of facilities: `airport`, `station`,
-`port`, `tunnel`, `bridge`, `road`, `railway-line`, `canal`, `building`,
-`theater`, `museum`, `palace`, `hotel`, `worship-place`, `market`,
-`sports-facility`, `park`, `zoo`, `amusement-park`.
+The guidelines currently list 19 types of facilities: `airport`, `station`,
+`port`, `tunnel`, `bridge`, `road`, `railway-line`, `canal`, `building`,
+`theater`, `museum`, `palace`, `hotel`, `worship-place`, `market`,
+`sports-facility`, `park`, `zoo`, `amusement-park`.
### social-group
-A large class with 6 types: `family`, `clan`, `ethnic-group`,
-`regional-group`, `religious-group`, `organization`. The common
-characteristic of all six is that they denote groups of people. In the case
-of `organization`, it also has a common property, set of activities, and
-usually also some kind of legal existence; this may be also true to some
-extent about families and even clans, but typically not about the other types
-of groups. More importantly, an organization typically has a name that refers
-to the organization as a single entity _(IBM)_, while an ethnic group is
-often denoted by a plural form of a name that labels one member of the group
-_(Baskové,_ the plural of _Bask_ “(a) Basque (person)”). It is thus unclear
-whether and why these diverse types should be analyzed the same way. An
-ethnic group is more like a categorial named entity (see also `product`), it
-denotes people who share a language and/or some cultural and historical
-heritage. Similarly, religious groups is just a common label for people who
-share beliefs. Do we also want a “named entity” for races, or for
-tall/short/slim/fat people etc., or for people who share political views, or
-love for rock-and-roll, or anything else?
+A large class with 6 types: `family`, `clan`, `ethnic-group`,
+`regional-group`, `religious-group`, `organization`. The common
+characteristic of all six is that they denote groups of people. In the case
+of `organization`, it also has a common property, set of activities, and
+usually also some kind of legal existence; this may be also true to some
+extent about families and even clans, but typically not about the other types
+of groups. More importantly, an organization typically has a name that refers
+to the organization as a single entity _(IBM)_, while an ethnic group is
+often denoted by a plural form of a name that labels one member of the group
+_(Baskové,_ the plural of _Bask_ “(a) Basque (person)”). It is thus unclear
+whether and why these diverse types should be analyzed the same way. An
+ethnic group is more like a categorial named entity (see also `product`), it
+denotes people who share a language and/or some cultural and historical
+heritage. Similarly, religious groups is just a common label for people who
+share beliefs. Do we also want a “named entity” for races, or for
+tall/short/slim/fat people etc., or for people who share political views, or
+love for rock-and-roll, or anything else?
#### organization
-A large type of the `social-group` class, see above for my doubts about its
-(dis)similarity to other types. There is much less doubt (than for the other
-types) that organizations are named entities, but the definition of the
-subtypes has to be clarified. The type has currently 11 subtypes:
-`international-organization`, `business`, `company`,
-`government-organization`, `political-organization`, `criminal-organization`,
-`armed-organization`, `academic-organization`, `association`,
-`sports-organization`, `religious-organization`.
+A large type of the `social-group` class, see above for my doubts about its
+(dis)similarity to other types. There is much less doubt (than for the other
+types) that organizations are named entities, but the definition of the
+subtypes has to be clarified. The type has currently 11 subtypes:
+`international-organization`, `business`, `company`,
+`government-organization`, `political-organization`, `criminal-organization`,
+`armed-organization`, `academic-organization`, `association`,
+`sports-organization`, `religious-organization`.
### nationality
-A separate top-level class. I do not understand why the authors did not make
-it a type of `social-group`, just like `ethnic-group` and `regional-group`,
-to which it is very close. For example, _Čech_ “Czech” can be, depending on
-context, any of the three: A member of the ethnic group (sharing the Czech
-language and traditions, including people who do not have Czech citizenship,
-as their ancestors left the country and settled abroad); a member of the
-nationality (having the citizenship of the Czech Republic, even if living
-abroad and/or having a mother tongue other than Czech); a member of a
-regional group, living or born in _Čechy_ “Bohemia” (as opposed to Moravia
-and Silesia, which are the other two parts of the country called _Česko_
-“Czechia”).
+A separate top-level class. I do not understand why the authors did not make
+it a type of `social-group`, just like `ethnic-group` and `regional-group`,
+to which it is very close. For example, _Čech_ “Czech” can be, depending on
+context, any of the three: A member of the ethnic group (sharing the Czech
+language and traditions, including people who do not have Czech citizenship,
+as their ancestors left the country and settled abroad); a member of the
+nationality (having the citizenship of the Czech Republic, even if living
+abroad and/or having a mother tongue other than Czech); a member of a
+regional group, living or born in _Čechy_ “Bohemia” (as opposed to Moravia
+and Silesia, which are the other two parts of the country called _Česko_
+“Czechia”).
### product
-This class is not listed in the current UMR guidelines, which seems to be a
-gap that has to be patched. The current taxonomy actually lists some types
-that could be classified as special cases of `product`, such as
-`aircraft-type` or `car-make`. But there are proper names for other products,
-like _Persil_ in example (4) above. Products are typical examples of what we
-call categorial named entity.
+This class is not listed in the current UMR guidelines, which seems to be a
+gap that has to be patched. The current taxonomy actually lists some types
+that could be classified as special cases of `product`, such as
+`aircraft-type` or `car-make`. But there are proper names for other products,
+like _Persil_ in example (4) above. Products are typical examples of what we
+call categorial named entity.
### vehicle
-This is a separate class with currently 5 types: `ship`, `aircraft`,
-`aircraft-type`, `spaceship`, `car-make`. Note that `aircraft-type` and
-`car-make` are categorial named entities that would be better described as
-special types of `product`. For `ship` and `spaceship` it is more typical
-that a name denotes a single instance (e.g., _Titanic_). Perhaps `aircraft`
-is also meant to denote an instance rather than a type. People may
-occasionally give a nickname to their car, then the name will also denote an
-instance and the entity type `car-make` will not be suitable for it.
+This is a separate class with currently 5 types: `ship`, `aircraft`,
+`aircraft-type`, `spaceship`, `car-make`. Note that `aircraft-type` and
+`car-make` are categorial named entities that would be better described as
+special types of `product`. For `ship` and `spaceship` it is more typical
+that a name denotes a single instance (e.g., _Titanic_). Perhaps `aircraft`
+is also meant to denote an instance rather than a type. People may
+occasionally give a nickname to their car, then the name will also denote an
+instance and the entity type `car-make` will not be suitable for it.
### computer-program
-This is a separate class with no types. It could be regarded as a type of
-`product`.
+This is a separate class with no types. It could be regarded as a type of
+`product`.
### food-dish
-This is a separate class with no types. It seems to be a categorial named
-entity like `product` but it has an unsharp boundary between names and
-descriptions of dishes, so it is quite questionable whether, how, and where
-exactly the class concept should be used.
+This is a separate class with no types. It seems to be a categorial named
+entity like `product` but it has an unsharp boundary between names and
+descriptions of dishes, so it is quite questionable whether, how, and where
+exactly the class concept should be used.
### cultural-artifact
-This is a separate class with currently 8 types: `work-of-art`, `picture`,
-`music`, `dance`, `show`, `broadcast-program`, `literature`, `publication`;
-the last one has subtypes `book`, `newspaper`, `magazine`, `journal`. Since
-there is no description, it is not clear what is the difference between
-`literature` and `publication` supposed to be. Also, there does not seem to
-be a category suitable for movies.
+This is a separate class with currently 8 types: `work-of-art`, `picture`,
+`music`, `dance`, `show`, `broadcast-program`, `literature`, `publication`;
+the last one has subtypes `book`, `newspaper`, `magazine`, `journal`. Since
+there is no description, it is not clear what is the difference between
+`literature` and `publication` supposed to be. Also, there does not seem to
+be a category suitable for movies.
-Some works of art could be seen as a categorial named entity similar to
-`product`: Typically there are many copies of a book, a movie, or a CD. But
-even here the prototypical reading is that the name refers to the single
-intangible work, not to one of its copies.
+Some works of art could be seen as a categorial named entity similar to
+`product`: Typically there are many copies of a book, a movie, or a CD. But
+even here the prototypical reading is that the name refers to the single
+intangible work, not to one of its copies.
### law
-This is a top-level class with two types: `court-decision`, `treaty`.
-Supposedly the class itself should be used for actual laws. There is a need
-for other types, such as a named dean's regulation at a university.
+This is a top-level class with two types: `court-decision`, `treaty`.
+Supposedly the class itself should be used for actual laws. There is a need
+for other types, such as a named dean's regulation at a university.
One could say that laws are close to publications; but they can hardly be categorized as a cultural artifact.
### language
-This is a top-level class without types. Supposedly there is no distinction
-between languages and dialects, i.e., names of dialects would also be labeled
-as `language`. Not sure about language groups and families. Note that names
-of languages are often (but not always) related to names of ethnic groups,
-nationalities, regions and countries.
+This is a top-level class without types. Supposedly there is no distinction
+between languages and dialects, i.e., names of dialects would also be labeled
+as `language`. Not sure about language groups and families. Note that names
+of languages are often (but not always) related to names of ethnic groups,
+nationalities, regions and countries.
-It is not clear how this label is intended to be used. Should it apply only
-to the name of the language (noun), e.g., _angličtina_ “(the) English
-(language)”, or also to adverbs (_Mluví anglicky._ “He speaks English.”) and
-adjectives (_Procvičuje si anglická slovesa._ “She is practicing English
-verbs.”)
+It is not clear how this label is intended to be used. Should it apply only
+to the name of the language (noun), e.g., _angličtina_ “(the) English
+(language)”, or also to adverbs (_Mluví anglicky._ “He speaks English.”) and
+adjectives (_Procvičuje si anglická slovesa._ “She is practicing English
+verbs.”)
-Do we also use this label for constructed languages _(esperanto)_? I think we
-do. Do we also use it for programming languages _(Pascal, C, Perl, Python)_?
-I am not sure. Maybe those fall under the class `computer-program`.
+Do we also use this label for constructed languages _(esperanto)_? I think we
+do. Do we also use it for programming languages _(Pascal, C, Perl, Python)_?
+I am not sure. Maybe those fall under the class `computer-program`.
### notational-system
-This is a top-level class with currently three types: `writing-script` (e.g.
-_dévanágarí_), `music-key`, `musical-note`. It denotes an abstract entity.
+This is a top-level class with currently three types: `writing-script` (e.g.
+_dévanágarí_), `music-key`, `musical-note`. It denotes an abstract entity.
### cultural-activity
-This is a top-level class without definition and without types. It would be
-useful to have an example. See also `event` below.
+This is a top-level class without definition and without types. It would be
+useful to have an example. See also `event` below.
### event
-This is a top-level class with currently 8 types: `incident`, `war`,
-`natural-disaster`, `earthquake`, `conference`, `game`, `festival`,
-`ceremony`. Besides the usual problem that types are not defined, it is not
-clear why `earthquake` shall be distinguished from other natural disasters.
-It is also unclear why `cultural-activity` is a class separate from `event`.
-
-Furthermore, note that this concept denotes events as entities, although
-events are typically processes (refer to the main distinction between
-entities, states and processes, shown in Table 1 in Section 3-1-1 of the UMR
-guidelines). Section 3-1-1 even uses the term “event” to refer to all
-processes in any packaging, plus entities and states when used in
-predication. Nevertheless, if an event has a name (such as _Druhá světová
-válka_ “World War II”), it is covered by this taxonomy. It would be helpful
-to have an annotated example here.
+This is a top-level class with currently 8 types: `incident`, `war`,
+`natural-disaster`, `earthquake`, `conference`, `game`, `festival`,
+`ceremony`. Besides the usual problem that types are not defined, it is not
+clear why `earthquake` shall be distinguished from other natural disasters.
+It is also unclear why `cultural-activity` is a class separate from `event`.
+
+Furthermore, note that this concept denotes events as entities, although
+events are typically processes (refer to the main distinction between
+entities, states and processes, shown in Table 1 in Section 3-1-1 of the UMR
+guidelines). Section 3-1-1 even uses the term “event” to refer to all
+processes in any packaging, plus entities and states when used in
+predication. Nevertheless, if an event has a name (such as _Druhá světová
+válka_ “World War II”), it is covered by this taxonomy. It would be helpful
+to have an annotated example here.
### award
-Top-level class with no types. Supposedly, _Nobelova cena za fyziku_ “Nobel
-Prize for Physics” would be an example.
+Top-level class with no types. Supposedly, _Nobelova cena za fyziku_ “Nobel
+Prize for Physics” would be an example.
### biomedical-entity
-This is a top-level class with currently 18 types:
-`molecular-physical-entity`, `small-molecule`, `protein`, `protein-family`,
-`protein-segment`, `amino-acid`, `macro-molecular-complex`, `enzyme`,
-`nucleic-acid`, `pathway`, `gene`, `dna-sequence`, `cell`, `cell-line`,
-`species`, `taxon`, `disease`, `medical-condition`. They are obviously
-inspired by the bulk of work on biomedical processing and we would need more
-documentation to understand how the authors intended to use them.
-
-However, at least three types reach into layman's language: `species`,
-`taxon`, and `disease`. The closely related `species` and `taxon` would be
-categorial named entities (like `product`), where the name denotes a whole
-category (type) of entities rather than a single instance. That is, if they
-deserve to be treated as named entities in the first place. For example,
-_kočka_ “cat” is an animal with a particular set of characteristics, just
-like _dub_ “oak” is a particular type (hyponym) of tree, and _hrad_ “castle”
-is a particular type of building. But the first two words are biological
-genuses, hence `taxon`s, while _hrad_ has no special status in the UMR
-taxonomy. (In the Czech grammar, all three are common nouns.) There is no
-reason why _kočka_ and _dub_ should be named entities. And by extension,
-there is little reason why `species` should be named entities, for example
-_kočka domácí_ “cat (Felis catus)”, or _dub letní_ “pedunculate oak (Quercus
-robur)”, or why other taxons should, for example _šelmy_ “beasts of prey,
-Carnivora”, _savci_ “mammals”, or _živočichové_ “animals, Animalia”. It is
-true that some species have names that are less common than others and were
-invented by scholars who discovered and described the species, rather than
-being part of the language since ancient times. But it would be neither
-tractable nor helpful to attempt to distinguish them. Perhaps the only
-exception is the scientific names in Latin, provided that the language of the
-annotated text is not Latin.
-
-Similarly, diseases may have scientific names but many common diseases are
-just common nouns or expressions (_angína_ “tonsillitis”, _chřipka_ “flu”,
-_mor_ “plague”, _neštovice_ “chickenpox”) and it is not clear why they should
-be handled differently from other common nouns. Moreover, diseases are states
-rather than entities.
+This is a top-level class with currently 18 types:
+`molecular-physical-entity`, `small-molecule`, `protein`, `protein-family`,
+`protein-segment`, `amino-acid`, `macro-molecular-complex`, `enzyme`,
+`nucleic-acid`, `pathway`, `gene`, `dna-sequence`, `cell`, `cell-line`,
+`species`, `taxon`, `disease`, `medical-condition`. They are obviously
+inspired by the bulk of work on biomedical processing and we would need more
+documentation to understand how the authors intended to use them.
+
+However, at least three types reach into layman's language: `species`,
+`taxon`, and `disease`. The closely related `species` and `taxon` would be
+categorial named entities (like `product`), where the name denotes a whole
+category (type) of entities rather than a single instance. That is, if they
+deserve to be treated as named entities in the first place. For example,
+_kočka_ “cat” is an animal with a particular set of characteristics, just
+like _dub_ “oak” is a particular type (hyponym) of tree, and _hrad_ “castle”
+is a particular type of building. But the first two words are biological
+genuses, hence `taxon`s, while _hrad_ has no special status in the UMR
+taxonomy. (In the Czech grammar, all three are common nouns.) There is no
+reason why _kočka_ and _dub_ should be named entities. And by extension,
+there is little reason why `species` should be named entities, for example
+_kočka domácí_ “cat (Felis catus)”, or _dub letní_ “pedunculate oak (Quercus
+robur)”, or why other taxons should, for example _šelmy_ “beasts of prey,
+Carnivora”, _savci_ “mammals”, or _živočichové_ “animals, Animalia”. It is
+true that some species have names that are less common than others and were
+invented by scholars who discovered and described the species, rather than
+being part of the language since ancient times. But it would be neither
+tractable nor helpful to attempt to distinguish them. Perhaps the only
+exception is the scientific names in Latin, provided that the language of the
+annotated text is not Latin.
+
+Similarly, diseases may have scientific names but many common diseases are
+just common nouns or expressions (_angína_ “tonsillitis”, _chřipka_ “flu”,
+_mor_ “plague”, _neštovice_ “chickenpox”) and it is not clear why they should
+be handled differently from other common nouns. Moreover, diseases are states
+rather than entities.
### variable