-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SOSO should recommend how to specify identifier for the metadata record #210
Comments
@smrgeoinfo We discussed linking to associated metadata records and added guidelines in the 1.2 release to cover this case: https://github.com/ESIPFed/science-on-schema.org/blob/master/guides/Dataset.md#metadata We're using that in DataONE to follow the SO record to the more detailed ISO/EML/FGDC records that might already exist. Is that sufficient for your use case? |
@mbjones thanks, but that's not the issue. We're gleaning schema.org metadata from dataset landing pages, and finding that we're ending up with duplicate records for the same dataset because there's no identifier for the metadata record. Just because they're about the same dataset doesn't mean they are the same metadata record. |
MagIC has this issue as we allow people to update a dataset. This is necessary to fix errors in the dataset or when people want to include more data in the dataset than they originally added or when MagIC added new fields to the data model. We mint a data DOI for each version but those data DOIs point to the same page that highlights the most updated version, but also lists previous versions with those also available for download. |
@smrgeoinfo thanks for clarifying @njarboe We have the same issue in DataONE, and the way we solved it is to differentiate the Persistent Identifier (PID) that maps to a specific content-immutable version of a file or package, and the Series Identifier (SID) that maps to the most recent version in a chain of versions. More details in the DataONE API docs When we harvest form a SO provider, we checksum the canonicalized version of the JSON-LD as the PID, and use the provided |
In harvesting/federated metadata systems, there needs to be an identifier for the metadata record (in parallel to the identifier for the resource it describes), so that harvesters can look at time stamps and metadata identifiers to determine if they need to reharvest a record. Using the @id property in the JSON-LD object is the obvious solution, but SOSO should have recommendations that this identifier is stable and bound to the metadata for a particular resource. Looking at what we've been harvesting for the EarthCube GeoCODES, this is NOT the case with current metadata.
The text was updated successfully, but these errors were encountered: