Skip to content

Qucosa METS Profile

Ralf Claussnitzer edited this page Sep 13, 2016 · 13 revisions

The Qucosa Content Model describes the Fedora datastreams for a valid Qucosa object. Beside the usual DC and RELS-EXT datastreams a Qucosa object has a MODS datastream of media-type application/mods+xml (a Library of Congress RFC), an SLUB-INFO datastream of type application/vnd.slub-info+xml (not formally defined), an QUCOSA-XML datastream of type application/xml(containing original, prior-migration data) and a number of optional attachment datastreams (IDs starting with ATT-) with the media-type of their content respectively.

METS Application Profile

METS defines standard elements, structures and relations between elements. While the usages of elements themselves are mostly clearly defined and agreed on, the actual structure (which metadata sections? how many? Which attributes?) strongly depends on the use case. METS is a bit like XML in this sense. To handle this flexibility in situations requiring interoperability so called METS Application Profiles are defined.

When submitting ingest or update requests to the SWORD Service using application/vnd.qucosa.mets+xml the Qucosa File Handler interprets the METS document and manages the deposit accordingly. For this to work the submitted METS must have a certain structure (Qucosa Profile).

Here is an overview of the structure:

<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink"
    xmlns:mext="http://slub-dresden.de/mets"
    xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version19/mets.v1-9.xsd">

    <!-- Record header -->
    <mets:metsHdr RECORDSTATE="ACTIVE"/>

    <!-- Descriptive metadata section -->
    <mets:dmdSec ID="DMD_000">
        <mets:mdWrap MDTYPE="MODS">
            <mets:xmlData>
                <!-- MODS description is embedded here -->
            </mets:xmlData>
        </mets:mdWrap>
    </mets:dmdSec>

    <!-- Technical metadata section -->
    <mets:amdSec ID="AMD_000">
        <mets:techMD ID="TECH_000">
            <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="SLUBINFO" MIMETYPE="application/vnd.slub-info+xml">
                <mets:xmlData>
                    <!-- SLUB-INFO record is embedded here -->
                </mets:xmlData>
            </mets:mdWrap>
        </mets:rightsMD>
    </mets:techSec>

    <!-- File section -->
    <mets:fileSec>
        <!-- This file-group holds all references to the objects attachments -->
        <mets:fileGrp USE="ORIGINAL">
            <!-- A file is described by ID, media-type and actual location -->
            <mets:file ID="ATT-0" MIMETYPE="application/pdf" mext:LABEL="Hauptdokument">
                <!--  -->
                <mets:FLocat LOCTYPE="URL" xlink:href="http://www.qucosa.de/fileadmin/data/datei1.pdf" />
            </mets:file>
            <!-- The number of files is not limited -->
            <mets:file ID="ATT-1" MIMETYPE="application/pdf" mext:LABEL="Sekundärdokument" >
                <mets:FLocat LOCTYPE="URL" xlink:href="http://www.qucosa.de/fileadmin/data/datei2.pdf" />
            </mets:file>
        </mets:fileGrp>
    </mets:fileSec>

    <!-- Logical document structure combines document metadata, administrative metadata and file references -->
    <mets:structMap TYPE="LOGICAL">
        <mets:div DMDID="DMD_000" AMDID="AMD_000" TYPE="article">
            <mets:fptr FILEID="ATT-0" />
            <mets:fptr FILEID="ATT-1" />
        </mets:div>
    </mets:structMap>
</mets:mets>

Never submit METS documents with more than one of the before mentioned METS sections. The behavior for multi-level METS documents is undefined and unimplemented.

Interpretation on Ingest

When the SWORD service processes an ingest request the Qucosa File Handler takes the METS document and extracts the necessary bits of information to create a proper Qucosa object. The extraction process is rather simple and straight forward: It's looking for one data section each:

  • One <mets:dmdSec> with one <mets:mdWrap MDTYPE="MODS"> to extract a MODS record
  • One <mets:dmdSec> with one <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="QUCOSA-XML" MIMETYPE="application/xml"> to extract the original Qucosa XML record for migration purposes
  • One <mets:amdSec> with one <mets:rigthsMD> with one <mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="SLUBINFO" MIMETYPE="application/vnd.slub-info+xml"> to extract a SLUB-INFO record
  • One <mets:fileSec> with one <mets:fileGrp USE="ORIGINAL"> with any number of <mets:file> elements to extract attachment and upload information`
  • One <mets:fileSec> with one <mets:fileGrp USE="DOWNLOAD"> with any number of <mets:file> elements to extract attachment and upload information`
  • It doesn't evaluate the <mets:structMap> element. However, a METS document without this section is not valid.

Deleting temporary files after successful ingest

If a referenced file is temporary just for an ingest or update request, the Qucosa SWORD file handler can be instructed to delete the file after a successful ingest into the repository. This is used to share a common upload directory between a front end system and the SWORD system. It does so for <FLocat> elements with the attribute USE set to TEMPORARY and only if the URI of the file has the scheme file::

<mets:file ID="ATT-2" MIMETYPE="application/pdf" mext:LABEL="Attachment">
  <mets:FLocat xmlns:xlin="http://www.w3.org/1999/xlink" LOCTYPE="URL" USE="TEMPORARY"
               xlin:href="file:/Attachment.pdf" />
</mets:file>

For this to work, the SWORD service needs write access to the referenced file. In case the service fails to delete the file, any successful ingest is not effected and the file will just remain in the directory.

Custom SLUB METS extension attribute for file label

Unfortunately, the METS element doesn't have a LABEL attribute for individually naming the referenced file. Thus a custom METS extension attribute is used to denote a files label. The extension schema is relatively simple:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://slub-dresden.de/mets"
           xmlns="http://slub-dresden.de/mets"
           elementFormDefault="qualified" attributeFormDefault="unqualified"
           version="1.0">
    <xs:attribute name="LABEL">
        <xs:simpleType>
            <xs:restriction xml:base="xs:string"/>
        </xs:simpleType>
    </xs:attribute>
</xs:schema>

Download permission information via MODS file group

Beside file elements in the file group ORIGINAL there can one more file group called DOWNLOAD. File elements in this group describe files which are meant to be permitted for download in the front end while the files declared in the ORIGINAL file group are hidden:

<mets:fileGrp USE="ORIGINAL">
     <mets:file ID="ATT-0" MIMETYPE="application/pdf" mext:LABEL="Archival copy">
          <mets:FLocat LOCTYPE="URL" xlink:href="http://..." />
     </mets:file>
</mets:fileGrp>
<mets:fileGrp USE="DOWNLOAD">
     <mets:file ID="ATT-1" MIMETYPE="application/pdf" mext:LABEL="Public document">
          <mets:FLocat LOCTYPE="URL" xlink:href="http://..." />
     </mets:file>
</mets:fileGrp>

Denote archival value via USE attribute

In principle every file is a candidate for archiving in some (long term) archival context. Which files will be used for archiving depends on the archival system configurations and workflows. To explicitly denote the archival value of a file set the USE attribute to ARCHIVE:

<mets:fileGrp USE="ORIGINAL">
     <mets:file ID="ATT-0" MIMETYPE="application/pdf" USE="ARCHIVE" mext:LABEL="Archival copy">
          <mets:FLocat LOCTYPE="URL" xlink:href="http://..." />
     </mets:file>
</mets:fileGrp>

Where the file attributes get stored

Since the file handler doesn't store METS, all the file group and USE attribute information needs to be stored somewhere else. Fedoras RELS-INT datastream would be the proper place for storing those properties by using the PREMIS ontology However, this would require rewriting a big part of the underlying SWORD library and parts of the METS Disseminator.

A quick (and not soo dirty) workaround for this is simply storing this in the flexible SLUB-INFO datastream. File infomation about download and archival value are stored in <slub:attachment> located in the <slub:info>/<slub:rights> element. Here information augmenting every datastream is stored and gets updated on every SWORD ingest and update request:

<slub:info>
     <slub:rights>
          <slub:attachment slub:ref="ATT-0" slub:hasArchivalValue="yes" slub:isDownloadable="yes"/>
     </slub:rights>
</slub:info>