Skip to content

Dictionaries Validation

petermr edited this page Sep 14, 2020 · 23 revisions

validation

This wikipage can be used for peer-review of the dictionaries in the run-up to the hackathon.

general principles of validation

A project-pair (A and B) should review each other's dictionaries: Here are some criteria:

  • do they use standard fields? and names?
  • do they have provenance ? (how they were created)
  • do they work in ami search?

the content.

  • are there entries which should be removed?
  • are there mis-labelled mislinked entries (e.g. wikidata links to scientific articles)
  • are there syntax or encoding problems?
  • are there multilingual entries?

country

overview

purpose

scope (including limitations)

peer-review

reviewer A

disease

drugs

funders

viruses

test and trace

zoonoses

non-pharmaceutical interventions

XSD schema generation

online tools

There are several. Here's https://www.freeformatter.com/xsd-generator.html. It takes a dictionary (we use country.xml) and analyzes what elements occur, in what context (e.g. children). Then it analyzes each element to see if it has attributes and what it their type.

schema v0.1

Here's the result of the xsd-generator's first guess at a "Russian Doll" schema. (https://www.oracle.com/technical-resources/articles/java/design-patterns.html) . XSD Schema can be very confusing so just take some of it for granted at this stage (I and others tried to get a simpler version in 2000 but were overruled). Luckily for dictionaries we don't need anything complicated. This will evolve as we try to accommodate all dictionaries.

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="dictionary">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="desc"/>
        <xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
            </xs:sequence>
            <xs:attribute type="xs:string" name="_p297_country" use="optional"/>
            <xs:attribute type="xs:string" name="description" use="optional"/>
            <xs:attribute type="xs:string" name="name" use="optional"/>
            <xs:attribute type="xs:string" name="term" use="optional"/>
            <xs:attribute type="xs:anyURI" name="wikidataURL" use="optional"/>
            <xs:attribute type="xs:string" name="wikipediaURL" use="optional"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute type="xs:string" name="title"/>
    </xs:complexType>
  </xs:element>
</xs:schema>

This what the schema creator guesses are what the author intended but it needs editing. We will wish to make some attributes required and look at the type of wikipediaURL.

Interpretation

  • schema structure The current example involves nested definitions sometimes called "Russian Doll".
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">

(just copy this accurately - we don't need to understand it. It sets the namespaces).

  • root element definition
  <xs:element name="dictionary">
...

      <xs:attribute type="xs:string" name="title"/>
...
  </xs:element>

This defines an element dictionary and requires it to have an attribute title, so our documents must look something like:

<dictionary title="foobar" >
 ...
</dictionary>

(The title can be anything at this stage - string is the least constraining).

  • child elements
  <xs:element name="dictionary">
    <xs:complexType>

the dictionary element can have many children, but in a given order

      <xs:sequence>
        <xs:element type="xs:string" name="desc"/>

There must be a single <desc>...</desc> child element. (We will revise this later...), followed by

        <xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
...
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute type="xs:string" name="title"/>
    </xs:complexType>
  </xs:element>

any number of <entry>...</entry> elements.

  • grandchild elements
        <xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
            </xs:sequence>
...

each <entry> element can contain any number of <synonym> elements

  • string content
              <xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>

The <synonym> elements have no element-children but can contain a text string

  • attributes The <entry> element can have many attributes:
        <xs:element name="entry" maxOccurs="unbounded" minOccurs="0">
          <xs:complexType>
...
            <xs:attribute type="xs:string" name="_p297_country" use="optional"/>
            <xs:attribute type="xs:string" name="description" use="optional"/>
            <xs:attribute type="xs:string" name="name" use="optional"/>
            <xs:attribute type="xs:string" name="term" use="optional"/>
            <xs:attribute type="xs:anyURI" name="wikidataURL" use="optional"/>
            <xs:attribute type="xs:string" name="wikipediaURL" use="optional"/>
          </xs:complexType>
        </xs:element>

By default all attributes are of type string and have been guessed as optional. We'll now refine that...

schema v0.2

We want some attributes to be mandatory, so here's the next version:

<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="dictionary">
    <xs:complexType>
      <xs:sequence>
        <xs:element type="xs:string" name="desc"/>
        <xs:element name="entry" maxOccurs="unbounded" minOccurs="1">
          <xs:complexType>
            <xs:sequence>
              <xs:element type="xs:string" name="synonym" maxOccurs="unbounded" minOccurs="0"/>
            </xs:sequence>
<!-- this only applies to country so we'll make it optional -->
            <xs:attribute type="xs:string" name="_p297_country" use="optional"/>
<!-- but these 3 are mandatory -->
            <xs:attribute type="xs:string" name="description"/>
            <xs:attribute type="xs:string" name="name"/>
            <xs:attribute type="xs:string" name="term"/>
<!-- these two are optional (there may not be wikipedia or wikidata values) -->
            <xs:attribute type="xs:anyURI" name="wikidataURL" use="optional"/>
            <xs:attribute type="xs:string" name="wikipediaURL" use="optional"/>
<!-- and we'll add these ones -->
            <xs:attribute type="xs:string" name="wikidataID" use="optional"/>
            <xs:attribute type="xs:string" name="wikipediaPage" use="optional"/>
          </xs:complexType>
        </xs:element>
      </xs:sequence>
      <xs:attribute type="xs:string" name="title"/>
    </xs:complexType>
  </xs:element>
</xs:schema>
Clone this wiki locally