Skip to content

Creating a new TEI file

Andrew Morrison edited this page Mar 24, 2023 · 19 revisions

Schema declarations

All TEI files should start with the following three lines:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="https://raw.githubusercontent.com/msdesc/consolidated-tei-schema/master/msdesc.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="https://raw.githubusercontent.com/msdesc/consolidated-tei-schema/master/msdesc.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

These can be copied and pasted verbatim and will not need updating when the schema is updated.

Using these in an XML-aware editor like Oxygen will mean errors will be highlighted with red underlining as you work on the file.

Ensure all files are validated before committing them to this repository.

Manuscript identifiers

The fourth line is usually the opening tag of the root element, which should look like this:

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="manuscript_UNIQUENUMBER">

The UNIQUENUMBER must be replaced with a number which is unique across all the Fihrist collections. That is needed because the xml:id attribute is what gives manuscripts their persistent URLs on the new web site, instead of the transitory URLs the old web site used to generate.

Batches of manuscript IDs have been pre-allocated to member institutions and are kept in the identifiers folder. Follow the instructions in the readme in that folder.

Note, if you want to commit new files before they are ready to be published on the Fihrist web site, you can comment out the manuscript ID, which will prevent that record from being included when the Fihrist web site is next re-indexed. For example:

<TEI xmlns="http://www.tei-c.org/ns/1.0"><!-- xml:id="manuscript_123456"-->

Just remember to change it back when you do wish it to be published, e.g.:

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="manuscript_123456">

msItem identifiers

It is not absolutely essential, but each msItem element can be given an xml:id attribute. By convention, the value is the filename (without the .xml) followed by -itemX (or -itemX-itemY for an msItem nested inside another msItem, or itemX-itemY-itemZ when triply-nested, etc.) In multi-part manuscripts, the part number should also be included.

Examples:

<msItem xml:id="MS_Marsh_538-item2">
<msItem xml:id="WMS_Arabic_420-part1-item1-item4">

The first example is the second msItem in MS_Marsh_538.xml. The second example is the fourth child msItem within the first msItem in the first msPart in WMS_Arabic_420.xml

Work keys

The xml:id attribute of an msItem element is only for referencing the catalogued description of an instance of a work in one manuscript and its TEI record. The "work_UNIQUENUMBER" IDs that suffix the URLs of work-pages on the Fihrist web site identify works as an intellectual entities. These are assigned to abstract works in the authority files, which only the central Fihrist editor should modify. When cataloguing the instance of a work in a manuscript, first check whether the same work in another manuscript is already in Fihrist, by searching on the web site. Try alternative titles or transliterations. If you find a match, copy its "work_UNIQUENUMBER" from the URL into the key attribute of the title child element(s) of the msItem in your TEI record (e.g. if cataloguing a copy of Gulistān, use key="work_20962").

If you cannot find a match, the work you are cataloguing is probably the first instance of it in any of the manuscripts in Fihrist. If so, create a blank key attribute (key="") in the title. After you have committed and pushed your record, a new entry will be generated for the work, which will have its own unique "work_UNIQUENUMBER" ID. The Fihrist editor will review it (to ensure it is indeed a unique work) and plug the new ID into the key attribute(s) in your record.

Not all works need keys. If, for example, you are cataloguing a book of one hundred short poems, you can choose to catalogue each poem as a child msItem, each with a title, but only add a key attribute to the title of the msItem for the whole poetry collection. In that case, only one authority entry will be created.

Person keys

These work the same as work keys, except the key attribute should be created in author or editor (or, in contexts other than the originators of works, persName) elements. And, as well as searching the Fihrist web site for the person's name, also search VIAF. If you find them in VIAF, you can plug the VIAF ID into the key attribute, prefixed by "person_" (e.g. for the poet Saʻdī, use key="person_100206721"). Try alternative versions of their name, or different transliterations.

If you cannot find a match, the person probably hasn't been mentioned in any of the manuscripts already in Fihrist. If so, create a blank key attribute (key="").

Not all persons need keys. If, for example, you are recording a given-name mentioned in a text, which is not and can never be further identified, you can choose not to add a key attribute to the persName. In that case, no authority entry will be created for that name.

Subject keys

These are similar to work and person keys, except you must use subjects in the Library of Congress Subject Headings classification. Search there and paste the LCSH ID into the key attribute of the term element, prefixed by "subject_" (e.g. for the topic of cheese-making, use key="subject_sh99005888"). Blank keys should not be used in term elements.

Online digital facsimiles

If the manuscript being described has been wholly or partially digitized, and available online, you can add the following inside the additional section, after the adminInfo:

<surrogates>
   <bibl type="digital-facsimile" subtype="____">
       <ref target="___________">
           <title>___________</title>
       </ref>
       <note>(___________)</note>
   </bibl>
</surrogates>

The subtype attribute should be either "full" or "partial". The target attribute should be the persistent URL of the digital surrogate. The title should be the name of the service hosting the image (e.g. "Digital Bodleian", "Manchester Digital Collections", etc). The note should be either "full digital facsimile" or a description of the extent of a partial digitization (e.g. "miniature paintings only", "single sample image", etc.) If there are multiple digital surrogates at different URLs, add more bibl tags in the same surrogates element.

Further information

The TEI schema has more detailed documentation, which is available here:

https://msdesc.github.io/consolidated-tei-schema/msdesc.html

A lot of the code snippets are examples from western medieval manuscripts, but the principles should be similar.