-
Notifications
You must be signed in to change notification settings - Fork 1
METS Folder Processing
[Work in Progress]
The dul_hydra.batch.mets_folder
task mines a folder of Tripod METS files for descriptive, structural, and administrative metadata and generates a batch of repository update objects.
The task assumes that objects represented in the METS folder files already exist in the repository and will display warnings for any objects it cannot find in the repository.
The task expects to find a mets_folder.yml
configuration file in the dul-hydra config
unless an alternate configuration file location is provided by the CONFIG_FILE environment variable. A sample configuration file can be found at config/mets_folder.yml.sample
and this can be copied to config/mets_folder.yml
and edited as desired.
There are three required and one optional command line arguments (technically, environment variables) that are / can be provided to the task.
The required environment variables are
- FOLDER - path to the folder containing the METS files; e.g.,
FOLDER=/nas/mets/folder/path/vica
- BATCH_USER - user key for the person to be associated with the update batch that is generated; e.g.,
[email protected]
- COLLECTION_PID - PID of the collection containing the objects referenced in the METS folder; e.g.,
COLLECTION_PID=changeme:572
The optional environment variable is
- CONFIG_FILE - path to
mets_folder.yml
configuration file; e.g.,CONFIG_FILE=config/mets_folder_abc.yml
. If this environment variable is omitted, the configuration atconfig\mets_folder.yml
will be used.
An example use of the task would be rake dul_hydra:batch_mets_folder FOLDER=/nas/mets/folder/path/vica [email protected] COLLECTION_PID=changeme:572
.
The task scans the indicated METS folder, inspecting the files it contains. Warning and/or error messages are displayed if certain conditions occurs.
When the scan is complete, the task displays a count of the number of files it found and the number of warnings and errors displayed. The task then prompts the user to decide whether to continue and create the update batch or to cancel the operation.
Inspected /nas/mets/folder/path/vica/
Found 4 files
Inspection generated 0 WARNINGS and 0 ERRORS
p - Create pending batch
x - Cancel operation
Enter p, x :
If you enter x
, no batch is created is created and the task simply exits. If you enter p
, the task creates an update batch based on the contents of the METS folder and places it on the list of batches for the user identified by the BATCH_USER
environment variable (typically, yourself). To apply the updates to the repository, you will need to log into the staff application, go to the Batches
page, and process the batch.
The mets_folder
task will generate an update object that will replace the descriptive metadata of the pertinent repository object with that present in the dmdSec
section of the METS file. This is a complete replace
operation. Any pre-existing descriptive metadata is removed from the repository object before the descriptive metadata from the METS file is applied.
The mets_folder
task will generate an update object that will replace the structural metadata of the pertinent repository object with structural metadata based on the structMap
section of the METS file. This is a complete replace
operation. Any pre-existing structural metadata is removed from the repository object before structural metadata based on the METS file is applied.
The structural metadata produced by the task is a transformation of the METS file structMap
section into a representation pertinent to the repository object.
If the root element of the METS file contains an ID
attribute, the mets_folder
task replaces the local_id
attribute of the repository object with a value based on the value of the ID
attribute. If the ID
attribute contains an underscore -- e.g., ID="abcd_efghi01003"
-- the underscore and the portion that precedes it is discarded and the remainder is used to set the local_id
. If the ID
attribute does not contain an underscore, then the ID
attribute is used to set the local_id
.
If the root element of the METS file contains a TYPE
attribute, the mets_folder
task replaces the display_format
attribute of the repository object with a value based on the value of the TYPE
attribute.
If the TYPE
attribute contains a colon -- e.g., TYPE="Resource:Image"
-- the colon and the portion that precedes it is discarded. The remainder (or the entire value of the TYPE
attribute if it did not contain a colon) is then converted to lower case. If the configuration file used by the task has an entry for this value in the display_format
section (e.g., slideshow: multi_image
), then the translated value is used for the display_format
. Otherwise, the downcased value from the TYPE
attribute is used.
If the METS file represents a collection and contains a metsHdr/agent
element, the mets_folder
task replaces the research_help_contact
attribute of the repository object with a value based on the value of the ID
attribute of the agent
element.