-
Notifications
You must be signed in to change notification settings - Fork 17
Dictionaries Validation
This wikipage can be used for peer-review of the dictionaries in the run-up to the hackathon.
A project-pair (A and B) should review each other's dictionaries: Here are some criteria:
- do they use standard fields? and names?
- do they have provenance ? (how they were created)
- do they work in
ami search
?
the content.
- are there entries which should be removed?
- are there mis-labelled mislinked entries (e.g. wikidata links to scientific articles)
- are there syntax or encoding problems?
- are there multilingual entries?
There are several. Here's https://www.freeformatter.com/xsd-generator.html. It takes a dictionary (we use country.xml
) and analyzes what element
s occur, in what context (e.g. children). Then it analyzes each element to see if it has attributes and what it their type.
Here's the result of the xsd-generator's first guess at a "Russian Doll" schema. (https://www.oracle.com/technical-resources/articles/java/design-patterns.html) . XSD Schema can be very confusing so just take some of it for granted at this stage (I and others tried to get a simpler version in 2000 but were overruled). Luckily for dictionaries we don't need anything complicated. This will evolve as we try to accommodate all dictionaries.
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="dictionary">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="desc"/>
<xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
<xs:attribute type="xs:string" name="_p297_country" use="optional"/>
<xs:attribute type="xs:string" name="description" use="optional"/>
<xs:attribute type="xs:string" name="name" use="optional"/>
<xs:attribute type="xs:string" name="term" use="optional"/>
<xs:attribute type="xs:anyURI" name="wikidataURL" use="optional"/>
<xs:attribute type="xs:string" name="wikipediaURL" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute type="xs:string" name="title"/>
</xs:complexType>
</xs:element>
</xs:schema>
This what the schema creator guesses are what the author intended but it needs editing. We will wish to make some attributes required
and look at the type
of wikipediaURL
.
- schema structure The current example involves nested definitions sometimes called "Russian Doll".
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
(just copy this accurately - we don't need to understand it. It sets the namespaces).
- root element definition
<xs:element name="dictionary">
...
<xs:attribute type="xs:string" name="title"/>
...
</xs:element>
This defines an element dictionary
and requires it to have an attribute title
, so our documents must look something like:
<dictionary title="foobar" >
...
</dictionary>
(The title can be anything at this stage - string
is the least constraining).
- child elements
<xs:element name="dictionary">
<xs:complexType>
the dictionary element can have many children, but in a given order
<xs:sequence>
<xs:element type="xs:string" name="desc"/>
There must be a single <desc>...</desc>
child element. (We will revise this later...), followed by
<xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
...
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute type="xs:string" name="title"/>
</xs:complexType>
</xs:element>
any number of <entry>...</entry>
elements.
- grandchild elements
<xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
...
each <entry>
element can contain any number of <synonym>
elements
- string content
<xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
The <synonym>
elements have no element-children but can contain a text string
- attributes
The
<entry>
element can have many attributes:
<xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
<xs:complexType>
...
<xs:attribute type="xs:string" name="_p297_country" use="optional"/>
<xs:attribute type="xs:string" name="description" use="optional"/>
<xs:attribute type="xs:string" name="name" use="optional"/>
<xs:attribute type="xs:string" name="term" use="optional"/>
<xs:attribute type="xs:anyURI" name="wikidataURL" use="optional"/>
<xs:attribute type="xs:string" name="wikipediaURL" use="optional"/>
</xs:complexType>
</xs:element>
By default all attributes are of type string
and have been guessed as optional
. We'll now refine that...
We want some attributes to be mandatory, so here's the next version:
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="dictionary">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="desc"/>
<xs:element name="entry" maxOccurs="unbounded" minOccurs="1">
<xs:complexType>
<xs:sequence>
<xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
</xs:sequence>
<!-- this only applies to country so we'll make it optional -->
<xs:attribute type="xs:string" name="_p297_country" use="optional"/>
<!-- but these 3 are mandatory -->
<xs:attribute type="xs:string" name="description"/>
<xs:attribute type="xs:string" name="name"/>
<xs:attribute type="xs:string" name="term"/>
<!-- these two are optional (there may not be wikipedia or wikidata values) -->
<xs:attribute type="xs:anyURI" name="wikidataURL" use="optional"/>
<xs:attribute type="xs:string" name="wikipediaURL" use="optional"/>
<!-- and we'll add these ones -->
<xs:attribute type="xs:string" name="wikidataID" use="optional"/>
<xs:attribute type="xs:string" name="wikipediaPage" use="optional"/>
</xs:complexType>
</xs:element>
</xs:sequence>
<xs:attribute type="xs:string" name="title"/>
</xs:complexType>
</xs:element>
</xs:schema>