Skip to content
This repository has been archived by the owner on Jun 27, 2020. It is now read-only.

METS Folder Processing

Jim Coble edited this page Sep 8, 2015 · 3 revisions

[Work in Progress]

The dul_hydra.batch.mets_folder task mines a folder of Tripod METS files for descriptive, structural, and administrative metadata and generates a batch of repository update objects.

Pre-Requisites

The task assumes that objects represented in the METS folder files already exist in the repository and will display warnings for any objects it cannot find in the repository.

The task expects to find a mets_folder.yml configuration file in the dul-hydra config unless an alternate configuration file location is provided by the CONFIG_FILE environment variable. A sample configuration file can be found at config/mets_folder.yml.sample and this can be copied to config/mets_folder.yml and edited as desired.

Execution

There are three required and one optional command line arguments (technically, environment variables) that are / can be provided to the task.

The required environment variables are

  • FOLDER - path to the folder containing the METS files; e.g., FOLDER=/nas/mets/folder/path/vica
  • BATCH_USER - user key for the person to be associated with the update batch that is generated; e.g., [email protected]
  • COLLECTION_PID - PID of the collection containing the objects referenced in the METS folder; e.g., COLLECTION_PID=changeme:572

The optional environment variable is

  • CONFIG_FILE - path to mets_folder.yml configuration file; e.g., CONFIG_FILE=config/mets_folder_abc.yml. If this environment variable is omitted, the configuration at config\mets_folder.yml will be used.

An example use of the task would be rake dul_hydra:batch_mets_folder FOLDER=/nas/mets/folder/path/vica [email protected] COLLECTION_PID=changeme:572.

The task scans the indicated METS folder, inspecting the files it contains. Warning and/or error messages are displayed if certain conditions occurs.

When the scan is complete, the task displays a count of the number of files it found and the number of warnings and errors displayed. The task then prompts the user to decide whether to continue and create the update batch or to cancel the operation.

Inspected /nas/mets/folder/path/vica/
Found 4 files
Inspection generated 0 WARNINGS and 0 ERRORS
p - Create pending batch
x - Cancel operation
Enter p, x : 

If you enter x, no batch is created is created and the task simply exits. If you enter p, the task creates an update batch based on the contents of the METS folder and places it on the list of batches for the user identified by the BATCH_USER environment variable (typically, yourself). To apply the updates to the repository, you will need to log into the staff application, go to the Batches page, and process the batch.

Details

Descriptive Metadata

The mets_folder task will generate an update object that will replace the descriptive metadata of the pertinent repository object with that present in the dmdSec section of the METS file. This is a complete replace operation. Any pre-existing descriptive metadata is removed from the repository object before the descriptive metadata from the METS file is applied.

Structural Metadata

The mets_folder task will generate an update object that will replace the structural metadata of the pertinent repository object with structural metadata based on the structMap section of the METS file. This is a complete replace operation. Any pre-existing structural metadata is removed from the repository object before structural metadata based on the METS file is applied.

The structural metadata produced by the task is a transformation of the METS file structMap section into a representation pertinent to the repository object.

Administrative Metadata

local_id

If the root element of the METS file contains an ID attribute, the mets_folder task replaces the local_id attribute of the repository object with a value based on the value of the ID attribute. If the ID attribute contains an underscore -- e.g., ID="abcd_efghi01003" -- the underscore and the portion that precedes it is discarded and the remainder is used to set the local_id. If the ID attribute does not contain an underscore, then the ID attribute is used to set the local_id.

display_format

If the root element of the METS file contains a TYPE attribute, the mets_folder task replaces the display_format attribute of the repository object with a value based on the value of the TYPE attribute.

If the TYPE attribute contains a colon -- e.g., TYPE="Resource:Image" -- the colon and the portion that precedes it is discarded. The remainder (or the entire value of the TYPE attribute if it did not contain a colon) is then converted to lower case. If the configuration file used by the task has an entry for this value in the display_format section (e.g., slideshow: multi_image), then the translated value is used for the display_format. Otherwise, the downcased value from the TYPE attribute is used.

research_help_contact

If the METS file represents a collection and contains a metsHdr/agent element, the mets_folder task replaces the research_help_contact attribute of the repository object with a value based on the value of the ID attribute of the agent element.

Clone this wiki locally