-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datamodel: reference sheet #1
Comments
Here are some initial thoughts on this draft: We might want to support more types of author identifiers beyond ORCID, as in DataCite. I feel very strongly that affiliation should be included with creators and contributors. In DataCite 4.3 (just released) it looks like: "affiliation": [ Having multiple descriptions with description types is nice: "descriptions": [ I just did a PR for the DataCite JSON module that includes a more flexible identifier block that should work if a site doesn't want to use DOIs. identifiers is mandatory, but any specific identifier isn't "identifiers": [ There should probably be a separate metadata block that would accept multiple dates about an object. The system can then map the system generated _created field into one of the dates, but providing flexibility around dates is really important. "dates": [ We probably want to support the full DataCite funding object "fundingReferences": [ We should stick with the DataCite related identifier structure unless there is a good reason not to "relatedIdentifiers": [ The DataCite license structure is a good place to start, but we should think if we want to provide a better connection between these fields and any files attached to the records "rightsList": [ In general I would prefer sticking with DataCite field names as much as possible. |
I was thinking about field names too... I wonder why keeping them the same as DataCite is important? I see this data model as internal. As long as translation to the fields datacite wants is possible and systematic we are good despite our potentially idiosyncratic internal names. People will always name things their own ways and that isn't a problem: what you do in your own house is your own business. It's when we start interacting with other services that it might be a problem. But then, it's just a matter of doing the translation for the other service. After all we will have to do translations for others as well down the road anyway. Having a standards committee's decisions reach all the way to my internal naming scheme seems oppressive 😄 . That being said, if it was just a matter of naming, I really wouldn't care (I don't mind re-naming to more domain appropriate names - I am not a domain expert 😸 ). However, I do see possibilities for our internal data layout to be structured in such a way that it is much more flexible than what following datacite's layout would allow us. For instance, coalescing authors and contributors into a common internal structure would allow us to just have one field/structure for both. That structure could even be reused for acknowledgements... I think there are a couple of these potentially elegant simplifications that would not subtract anything to the understandability of the metadata of the record when taken out of the repository. I agree we should use DataCite to inform our required needs and as the starting point for names and structures. |
I'd argue having the internal data match a standard is good unless there is a compelling reason not to. We could spend a lot of time arguing whether "funder" or "funderName" is a better label (for example), but I'm not sure it would be a productive use of our time. Not having to do a mapping also helps with interoperability. I also agree on the author and contributor coalescing...DataCite has implemented this in their JSON version and I just added it to the invenio wrapper in inveniosoftware/datacite#50 |
I'm closing the ticket nearly all have been implemented, except for specific items that need further design like access control, files, pids etc. |
This is a reference sheet for the core metadata shared by InvenioRDM records:
Jsonschema as of 2019-12-20 :
Fields
_access / access_right : See Combine / disambiguate _access and access_right #37
Access levels : See Access level metadata #47
Authors / Creators / Collaborators : See RFC: rdm: define {crea,contribu}tors schema rfcs#11 Implementation: Implement Creators and Contributors equivalent metadata fields #19
Dates
publication_date
,embargo_date
.FundingReference:
Identifiers: See rdm: define {crea,contribu}tors schema rfcs#11
Language
publication_date
(instead ofpublication_year
):Publisher:
Resource Type (resourceTypeGeneral)
{'resourceTypeGeneral': ..., 'resourceType': ...}
->{'type': ..., 'subtype': ...}
Sizes
Questions:
Formats
Questions:
Rights list:
license
field.Subjects
Titles
title
(string field)additional_titles
(like DataCitetitles
)Version:
Extras for all
references
(need to check zenodo structure if ok). Not all has related identifiers, sometimes is just a reference to the text.internal notes
(discuss with ILS what they have). "curators notes", non-public notes._
#38How to handle custom fields?
custom
)Zenodo custom fields
TODO
Discussion points for community:
*CV: Control Vocabulary
updated 2019-08-16 with comments below.
updated 2019-12-20 with issues.
The text was updated successfully, but these errors were encountered: