WIP
See Metadata System architecture
Three types of metadata
- Metadata record exists (external to DMS).
- Metadata record is to be created through MET (for purposes of ingestion in to an external metadata management system).
- No metadata record exists, but dataset is well-known and documented, and "implicit" metadata exists.
- For example ABS census data
As per REQUIREMENTS.md#metadata-api, we want to have standard vocabs/enumerated values for data variables, keywords, etc.
- The vast majority of - if not all - datasets will require us to manually match data variables with our standard lists/vocabs.
- Some data providers may be happy to engage with us to either provide metadata compliant with our standard/lists - or at very least verify the metadata manually created by us.
- Either way, I think the easiest way forward is for us to do a first pass of manually creating the required metadata when we get access to the data.
- The MET will not be used for this type of data entry for metadata records that exist externally.
We will need to perform validation on our manually curated metadata as the data is harvested:
- If the data structure changes, we need to be notified so we can update metadata.
We aim to automatically harvest all external metadata records automatically.
Depending on data/metadata update frequency, we can schedule harvesting on different time-scales (eg daily, weekly, ...).
If there are metadata records with particularly poor access (eg externally hosted but requires human intervention for access), and we do not expect the metadata to change, we can manually ingest the metadata record. In this case, we must document how and why we ingested the metadata record.
- We may be able to harvest and validate keywords/variables against our standard lists/vocabs.
- Most likely, we will need to manually match data variables/keywords with our standard lists/vocabs
Following https://github.com/aodn/rimrep-dms/blob/main/docs/REQUIREMENTS.md#metadata-entry-tool-met, the MET will be used by data providers to create a new metadata record only when all of the following are true:
- A record does not exist in an external metadata management systems
- The data provider does not have a metadata management system
- The data provider wants to create a metadata record
- The data provider agrees to the new metadata record being ingested into an external metadata management system by the end of the project
After the Metadata record has been created, we temporarily use the generated XML file until it has been ingested into an external metadata management system.
When the metadata record is in an external metadata management system, we will transition to automatically harvesting the record.
Currently, we are assuming that said external metadata management system is GBRMPA's Reef Knowledge System GeoNetwork instance.
- We can temporarily use the output JSON file to generate STAC fields.
- When the record is ingested into an external metadata management system, it will be treated like all other external metadata records.
In this case, basically, we create metadata ourselves.
We have to manually match data variables/keywords with our standard lists/vocabs