-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata: Dataverse project take ownership of documentation for creating metadata blocks #3168
Comments
#3180 (comment) is the most recent example of me updating the Solr schema due to a field being added. |
Here's a comment by @edzale at #3506 Hi,
|
From IRC today:
http://irclog.iq.harvard.edu/dataverse/2018-01-14 It would be nice to add some documentation on this, assuming we want to support custom metadata blocks. |
I worked more on the first section of the document that explains how the metadata block tsv is put together. It looks like the second section, about steps for installing metadata blocks, could use more details by those who've gone through that process. There are also questions about how to edit/reinstall metadatablocks that could be answered here by people who've done it. Perhaps the doc could be reviewed by a developer to make sure it's clear and accurate. Then decide how it should be added to the guides. And new issues can be created to add more information about editing/reinstalling blocks, etc. |
Hi All I've recently had a conversation with Danny and Gustavo (which they found unclear) about the Dataverse Metadata Blocks and issues with local customisation and harvesting. I hope this explains better... Australian Data Archive (ADA) publish mainly social science survey data so there are some DDI elements/Dataverse fields that use a fairly static vocab. An example is Kind of Data [Survey data, Census data, textual data, diaries, aggregate...]; Unit of Analysis; Time Method etc. I'm not referring to vocab servers, but drop-downs/tick boxes as already implemented using the TSV. At the moment, to create standard metadata we include the vocab lists in our templates as text, or refer archivists to documentation. Implications of customising Dataverse metadata blocks:
We are also having ongoing discussions with Julian about the copyright and version DDI elements not included in the Citation Block - which we have to combine as text in the Notes field. I'm not sure where this is at? These comments may be better in another space, let me know. |
Hi @janetm. Thanks for pointing out issues about customizing metadata blocks and how it affects harvesting. And apologies for replying so late. I agree that it's appropriate that we try to clarify these questions in the documentation for creating metadata blocks, which I think should include editing metadata blocks. I hope I can help answer your questions here and in the documentation (and of course invite developers to yell at me when I'm wrong :) :
I can't imagine any technical issues with editing the default tsv files to allow controlled vocabularies for Kind of Data, Unit of Analysis and other fields that I think you have in mind. (We know that a large number of CV terms raises usability issues, but DDI guidelines suggest a small number of terms for the fields you've mentioned, right?)
I think modified fields are already excluded from harvesting: @scolapasta told me that during harvesting Dataverse will try to harvest metadata even when it's a metadata document that isn't composed the way Dataverse expects it to be. I take this to mean that if during harvesting Dataverse expects Kind of Data in the oai_ddi.xml, like this:
But the element name (Since Dataverse creates ddi.xml that won't validate against the schema because some elements are put in the wrong places or misused, I've always wondered if while harvesting valid ddi.xml, Dataverse would ignore elements because it expects to find them in the wrong places.)
There's a github issue (#4570) about migrating datasets that already have versions. I think it's complicated because Dataverse automatically assigns versions, so we need to think about how migrating >1 versions will work. I don't know how the versioning that Dataverse does now affects harvesting. (I see that on the search results pages, the cards of harvested datasets don't include version numbers, so maybe it's not an issue?) For the copyright element issue (and any of these issues really), could we email to schedule a time to chat? In an issue about making Dataverse produce valid ddi metadata (#3648), I proposed using the copyright element differently than I think you and Steve would like to, and I'd like to get your thoughts. Thanks! |
@jggautier to me taking ownership of the documentation means adding a page to the dev guide on this topic. It would mean a pull request. Does that make sense? The lack of documentation definitely came up during the Dataverse Community Meeting last week. I'd love for this issue to be prioritized. Also, I'd like to point out that #4451 is related. |
Adding a page (or maybe adding content on an existing page) in the guides sounds good to me. It'll put content on GitHub and make it versioned. I'd need to talk to someone more familiar with Sphynx about how to move the content in the Google Doc into the Dataverse guides. It sounds like you think that adding more info to the Google Doc about installing or reinstalling metadata blocks should be considered after the content has been moved to the guides. |
During estimation, @pameyer suggested saving the Google Doc as a docx file and using Pandoc (https://pandoc.org) to convert that to .rst, which Sphynx uses. The team agreed to move to the guides only content we feel is solid right now - the first section that describes the parts of the metadata block tsv - and open other GitHub issues for moving other content to the guides, i.e. instructions and guidelines for editing and installing metadata blocks. One thing not discussed was where in the guides this should go. Users can use this info to create or edit metadata blocks during installation, and create or edit metadata blocks after installation. So I could see this going in the installation guide or the admin guide. Currently, the Appendix is the only section with info about metadata blocks. |
I think the Admin Guide would be a good place. Perhaps we could add the question "Am I happy with the metadata fields available out of the box or do I want to create a custom metadata block?" at http://guides.dataverse.org/en/4.9.2/installation/prep.html#decisions-to-make and link to the new content in the Admin Guide. |
Converted the old google doc into a .rst and added it to our guides. Still needs some syntax finessing.
For future reference: Pete's suggested method worked very well for converting a google doc to a properly formatting .rst file for our guides:
|
I've added the new page, but when I'm back on Tuesday I'll finish the syntax fine tuning and we'll be good to go. |
Looks good so far, @dlmurphy. You can preview the guides |
Cleaned up the syntax in a49c1bd and it's looking much nicer now. Sending to code review for @jggautier to make sure his vision has been realized. |
Made some edits to both formatting and content based on @jggautier's review
Awesome. Thanks @dlmurphy. Moving to QA. |
At JHU we currently have two use cases for creating custom metadata blocks:
Neither of these is (though the latter may eventually be) suitable for inclusion in common metadata blocks that would be supported by Dataverse developers or the community at large, so we need to be able to create these blocks locally.
The Dataverse Team did not expect that individual instances would create their own metadata blocks, so documentation for them is sparse. Since we needed a better of understanding of how to do this, I put together a document that captured my understanding and asked the DV team (thanks @posixeleni, @pdurbin, @zoidy, @bmckinney, @scolapasta, and @bencomp for your contributions) to help fix errors and clarify points.
At this point, Dataverse 4.x Metadata Blocks syntax/semantics is in pretty good shape with regard to defining and loading the metadata blocks, so it would be great if the project would take ownership and responsibility for maintaining the document in some (perhaps completely different) form.
NB: More support/documentation for needed Solr schema changes are still needed to provide full custom metadata block support for local instances.
The text was updated successfully, but these errors were encountered: