Skip to content

Latest commit

 

History

History
100 lines (75 loc) · 3.44 KB

README.md

File metadata and controls

100 lines (75 loc) · 3.44 KB

mets-mods2tei

CircleCI codecov

Convert bibliographic meta data in METS/MODS format to TEI headers and optionally serialize linked ALTO-encoded OCR to TEI text.

Background

MODS is the de-facto standard for encoding bibliographic meta data in libraries. It is usually included as a separate section into METS XML files. Physical and logical structure of a document are expressed in terms of structural mappings (structMap elements).

TEI is the de-facto standard for representing digital text for research purposes. It usually includes detailed bibliographic meta data in its header.

Since these standards contain a considerable amount of degrees of freedom, the conversion uses well-defined subsets. For MODS, this is the MODS Anwendungsprofil für digitalisierte Medien. For METS, the METS Anwendungsprofil für digitalisierte Medien 2.1 is consulted. For the TEI Header, the conversion is roughly based on the DTA base format.

mets-mods2tei is developed at the Saxon State and University Library in Dresden.

Installation

mets-mods2tei is implemented in Python 3. In the following, we assume a working Python 3 (tested versions 3.5, 3.6 and 3.7) installation.

Clone the repository

The first installation step is the cloning of the repository:

$ git clone https://github.com/wrznr/mets-mods2tei.git
$ cd mets-mods2tei

virtualenv

Using virtualenv is highly recommended, although not strictly necessary for installing mets-mods2tei. It may be installed via:

$ [sudo] pip install virtualenv

Create a virtual environement in a subdirectory of your choice (e.g. env) using

$ virtualenv -p python3 env

and activate it.

$ . env/bin/activate

Python requirements

mets-mods2tei can be installed via pip:

(env) $ pip install .

Testing

mets-mods2tei uses pytest-based testing.

Install the test requirements:

(env) pip install -r requirements-test.txt

Run the tests via:

(env) $ pytest

Code coverage

Determine code coverage by running

(env) $ make coverage

Invocation

Installing mets-mods2tei makes the command line tool mm2tei available:

(env) $ mm2tei --help
Usage: mm2tei [OPTIONS] METS

  METS: File containing or URL pointing to the METS/MODS XML to be converted

Options:
  -o, --ocr                       Serialize OCR into resulting TEI
  -l, --log-level [DEBUG|INFO|WARN|ERROR|OFF]
  --help                          Show this message and exit.

It reads METS XML via URL or file argument and prints the resulting TEI including the extracted information from the MODS part of the METS.

(env) $ mm2tei "https://digital.slub-dresden.de/oai/?verb=GetRecord&metadataPrefix=mets&identifier=oai:de:slub-dresden:db:id-453779263"