You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 29, 2024. It is now read-only.
When Archivematica processes a zipped bag with a bag-info.txt file in it, the information is transferred to a METS file, and can be found in a predictable location in the analog/digital source metadata sub-section of the administrative metadata section . See attached as an example (the original bag-info.txt file and the METS file generated by Archivematica are included).
The tool that extracts data from the bags and indexes them will need to be aware of the location of the data we want to index is (e.g., in a METS file at the analog/digital source metadata sub-section of the administrative metadata section , or if that data doesn't exist, in the bag-info.txt file).
Using something like the proof of concept BagIt Indexer, example logic would be: If there is a file named "METS.xml" at the root of the Bag's /data directory, look for data at /data/METS.xml// and index it so each element is in a searchable field; if "METS.xml" doesn't exist, index the fields in /bag-info.txt. (We'll need a third fallback option here, in case there is a METS.xml file but it doesn't contain /.) Pretty standard stuff. The gotcha here is that if we add a third deposit type (say a non-bag deposit), the indexer script would need to know where to get the relevant data for that deposit type. Given today's software development practices, this sort of extensibility can be handled by using a plugin architecture where a different plugin detects the presence of the desired data and extracts it, and then passes it off to the indexing engine.
Of course, using the same names for the METS elements and bag-info.txt fields will result in better queries. However, plugins can also map source fields to a common field name for indexing purposes.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When Archivematica processes a zipped bag with a bag-info.txt file in it, the information is transferred to a METS file, and can be found in a predictable location in the analog/digital source metadata sub-section of the administrative metadata section . See attached as an example (the original bag-info.txt file and the METS file generated by Archivematica are included).
The tool that extracts data from the bags and indexes them will need to be aware of the location of the data we want to index is (e.g., in a METS file at the analog/digital source metadata sub-section of the administrative metadata section , or if that data doesn't exist, in the bag-info.txt file).
Using something like the proof of concept BagIt Indexer, example logic would be: If there is a file named "METS.xml" at the root of the Bag's /data directory, look for data at /data/METS.xml// and index it so each element is in a searchable field; if "METS.xml" doesn't exist, index the fields in /bag-info.txt. (We'll need a third fallback option here, in case there is a METS.xml file but it doesn't contain /.) Pretty standard stuff. The gotcha here is that if we add a third deposit type (say a non-bag deposit), the indexer script would need to know where to get the relevant data for that deposit type. Given today's software development practices, this sort of extensibility can be handled by using a plugin architecture where a different plugin detects the presence of the desired data and extracts it, and then passes it off to the indexing engine.
Of course, using the same names for the METS elements and bag-info.txt fields will result in better queries. However, plugins can also map source fields to a common field name for indexing purposes.
The text was updated successfully, but these errors were encountered: