-
Notifications
You must be signed in to change notification settings - Fork 1
Qucosa METS Profile
The Qucosa Content Model describes the Fedora datastreams for a valid Qucosa object. Beside the usual DC
and RELS-EXT
datastreams a Qucosa object has a MODS
datastream of media-type application/mods+xml
(a Library of Congress RFC), an SLUB-INFO
datastream of type application/vnd.slub-info+xml
(not formally defined), an QUCOSA-XML
datastream of type application/xml
(containing original, prior-migration data) and a number of optional attachment datastreams (IDs starting with ATT-
) with the media-type of their content respectively.
- Learn more about the Fedora Digital Object Model: https://wiki.duraspace.org/display/FEDORA38/Fedora+Digital+Object+Model
- Learn more about Library of Congress media-types: https://tools.ietf.org/html/rfc6207
METS defines standard elements, structures and relations between elements. While the usages of elements themselves are mostly clearly defined and agreed on, the actual structure (which metadata sections? how many? Which attributes?) strongly depends on the use case. METS is a bit like XML in this sense. To handle this flexibility in situations requiring interoperability so called METS Application Profiles are defined.
When submitting ingest or update requests to the SWORD Service using application/vnd.qucosa.mets+xml
the Qucosa File Handler interprets the METS document and manages the deposit accordingly. For this to work the submitted METS must have a certain structure (Qucosa Profile).
Here is an overview of the structure:
<?xml version="1.0" encoding="UTF-8"?>
<mets:mets xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:mets="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:mext="http://slub-dresden.de/mets"
xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/version19/mets.v1-9.xsd">
<!-- Record header -->
<mets:metsHdr RECORDSTATE="ACTIVE"/>
<!-- Descriptive metadata section -->
<mets:dmdSec ID="DMD_000">
<mets:mdWrap MDTYPE="MODS">
<mets:xmlData>
<!-- MODS description is embedded here -->
</mets:xmlData>
</mets:mdWrap>
</mets:dmdSec>
<!-- Technical metadata section -->
<mets:amdSec ID="AMD_000">
<mets:techMD ID="TECH_000">
<mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="SLUBINFO" MIMETYPE="application/vnd.slub-info+xml">
<mets:xmlData>
<!-- SLUB-INFO record is embedded here -->
</mets:xmlData>
</mets:mdWrap>
</mets:rightsMD>
</mets:techSec>
<!-- File section -->
<mets:fileSec>
<!-- This file-group holds all references to the objects attachments -->
<mets:fileGrp USE="ORIGINAL">
<!-- A file is described by ID, media-type and actual location -->
<mets:file ID="ATT-0" MIMETYPE="application/pdf" mext:LABEL="Hauptdokument">
<!-- -->
<mets:FLocat LOCTYPE="URL" xlink:href="http://www.qucosa.de/fileadmin/data/datei1.pdf" />
</mets:file>
<!-- The number of files is not limited -->
<mets:file ID="ATT-1" MIMETYPE="application/pdf" mext:LABEL="Sekundärdokument" >
<mets:FLocat LOCTYPE="URL" xlink:href="http://www.qucosa.de/fileadmin/data/datei2.pdf" />
</mets:file>
</mets:fileGrp>
</mets:fileSec>
<!-- Logical document structure combines document metadata, administrative metadata and file references -->
<mets:structMap TYPE="LOGICAL">
<mets:div DMDID="DMD_000" AMDID="AMD_000" TYPE="article">
<mets:fptr FILEID="ATT-0" />
<mets:fptr FILEID="ATT-1" />
</mets:div>
</mets:structMap>
</mets:mets>
Never submit METS documents with more than one of the before mentioned METS sections. The behavior for multi-level METS documents is undefined and unimplemented.
When the SWORD service processes an ingest request the Qucosa File Handler takes the METS document and extracts the necessary bits of information to create a proper Qucosa object. The extraction process is rather simple and straight forward: It's looking for one data section each:
- One
<mets:dmdSec>
with one<mets:mdWrap MDTYPE="MODS">
to extract a MODS record - One
<mets:dmdSec>
with one<mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="QUCOSA-XML" MIMETYPE="application/xml">
to extract the original Qucosa XML record for migration purposes - One
<mets:amdSec>
with one<mets:rigthsMD>
with one<mets:mdWrap MDTYPE="OTHER" OTHERMDTYPE="SLUBINFO" MIMETYPE="application/vnd.slub-info+xml">
to extract a SLUB-INFO record - One
<mets:fileSec>
with one<mets:fileGrp USE="ORIGINAL">
with any number of<mets:file>
elements to extract attachment and upload information` - One
<mets:fileSec>
with one<mets:fileGrp USE="DOWNLOAD">
with any number of<mets:file>
elements to extract attachment and upload information` - It doesn't evaluate the
<mets:structMap>
element. However, a METS document without this section is not valid.
If a referenced file is temporary just for an ingest or update request, the Qucosa SWORD file handler can be instructed to delete the file after a successful ingest into the repository. This is used to share a common upload directory between a front end system and the SWORD system. It does so for <FLocat>
elements with the attribute USE
set to TEMPORARY
and only if the URI of the file has the scheme file:
:
<mets:file ID="ATT-2" MIMETYPE="application/pdf" mext:LABEL="Attachment">
<mets:FLocat xmlns:xlin="http://www.w3.org/1999/xlink" LOCTYPE="URL" USE="TEMPORARY"
xlin:href="file:/Attachment.pdf" />
</mets:file>
For this to work, the SWORD service needs write access to the referenced file. In case the service fails to delete the file, any successful ingest is not effected and the file will just remain in the directory.
Unfortunately, the METS element doesn't have a LABEL attribute for individually naming the referenced file. Thus a custom METS extension attribute is used to denote a files label. The extension schema is relatively simple:
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://slub-dresden.de/mets"
xmlns="http://slub-dresden.de/mets"
elementFormDefault="qualified" attributeFormDefault="unqualified"
version="1.0">
<xs:attribute name="LABEL">
<xs:simpleType>
<xs:restriction xml:base="xs:string"/>
</xs:simpleType>
</xs:attribute>
</xs:schema>
Beside file elements in the file group ORIGINAL
there can one more file group called DOWNLOAD
. File elements in this group describe files which are meant to be permitted for download in the front end while the files declared in the ORIGINAL
file group are hidden:
<mets:fileGrp USE="ORIGINAL">
<mets:file ID="ATT-0" MIMETYPE="application/pdf" mext:LABEL="Archival copy">
<mets:FLocat LOCTYPE="URL" xlink:href="http://..." />
</mets:file>
</mets:fileGrp>
<mets:fileGrp USE="DOWNLOAD">
<mets:file ID="ATT-1" MIMETYPE="application/pdf" mext:LABEL="Public document">
<mets:FLocat LOCTYPE="URL" xlink:href="http://..." />
</mets:file>
</mets:fileGrp>
In principle every file is a candidate for archiving in some (long term) archival context. Which files will be used for archiving depends on the archival system configurations and workflows. To explicitly denote the archival value of a file set the USE
attribute to ARCHIVE
:
<mets:fileGrp USE="ORIGINAL">
<mets:file ID="ATT-0" MIMETYPE="application/pdf" USE="ARCHIVE" mext:LABEL="Archival copy">
<mets:FLocat LOCTYPE="URL" xlink:href="http://..." />
</mets:file>
</mets:fileGrp>
Since the file handler doesn't store METS, all the file group and USE
attribute information needs to be stored somewhere else. Fedoras RELS-INT
datastream would be the proper place for storing those properties by using the PREMIS ontology However, this would require rewriting a big part of the underlying SWORD library and parts of the METS Disseminator.
A quick (and not soo dirty) workaround for this is simply storing this in the flexible SLUB-INFO
datastream. File infomation about download and archival value are stored in <slub:attachment>
located in the <slub:info>/<slub:rights>
element. Here information augmenting every datastream is stored and gets updated on every SWORD ingest and update request:
<slub:info>
<slub:rights>
<slub:attachment slub:ref="ATT-0" slub:hasArchivalValue="yes" slub:isDownloadable="yes"/>
</slub:rights>
</slub:info>