-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Native API: Import ddi dataset does not appear to be working, fails on parsing subject. #8210
Comments
With a quick code look - it looks like the DDI exporter writes the citation block subject and keyword fields into the studyDsc subject element as keyword elements, but the ImportDDIServiceBean just parses all of those elements back in as keywords (and the only line to write out to the subject field in the DTO has been commented out for 7 years: dataverse/src/main/java/edu/harvard/iq/dataverse/api/imports/ImportDDIServiceBean.java Line 571 in a70c1c4
I'm not sure what a good solution would be - the imported could potentially scan for values that match the subject controlled vocab choices and import those as subjects, but that could potentially transfer a Keyword entry to the Subject field if there's an accidental match. (As a CVV field, subject now gets an xml:lang attribute if you've enabled i18n support, whereas keywords don't, but that is still basically trying to compensate for the exporter collapsing two citation block fields into one DDI xml element.) |
Hello all, But subject is a different matter since it seems to be the dataverse specific thing (format). The DDI subject has keywords that are translated into dataverse keywords. There are no fields in DDI that would correspond to dataverse subject field. Subject is also from controlled vocabulary. In this situation any import of DDI will fail and as a result importDDI api function is useless now. One of the solution would be as @qqmyers mentioned to check keywords for controlled vocabulary and add it to subject list. It will work for us since in our case the subject is always "Social Sciences" and we can alter xmls to include such a keyword. But in general case not all ddi have subject and keywords and in this case import will always fail. There maybe another way to get around it and introduce additional optional parameter subject in importddi call. If this parameter is present subject can be checked against controlled vocabulary and if there is such in controlled vocabulary then it can be added to citation
I would like to know what you think about it? |
@lubitchv thanks for pull request #8447. As we've been discussing in Slack, we'll make sure you, @scolapasta and I all agree on the direction before anyone dives back into coding. Meanwhile, I did a little digging into why "import DDI" stopped working. There was a change in pull request #8088 that reordered two methods below. On develop (1487650), I switched the order back and "import DDI" worked. Now, this isn't really a fix because we swapped those methods to fix #6752 but at least we know why and when (Dataverse 5.7) "import DDI" stopped working. |
@lubitchv @scolapasta and I just discussed the next steps. At a high level, we want to fix the old Here are some details: @lubitchv will create a new pull request that does the following:
SWORD rules for Subject and Keyword: “Subject” uses our controlled vocabulary list of subjects. This list is in the Citation Metadata of our User Guide > Metadata References. Otherwise, if a term does not match our controlled vocabulary list, it will put any subject terms in “Keyword”. If Subject is empty it is automatically populated with “N/A”. -- https://guides.dataverse.org/en/5.9/api/sword.html @lubitchv please do not hesitate to reach out if anything comes up! Thanks so much for working on this! |
There is also an issue opened to provide a working sample ddi in the guides, #8209 .
For now, until proven otherwise, it appears this endpoint isn't working properly. The json endpoint does.
One hypothesis is that a pr that enforced required fields revealed a prior unknown issue in ddi import parsing, showing up as a null subject error.
There are server log errors, only a command line error: {"status":"ERROR","message":"Validation Failed: Subject is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ])."}
Please note there are two paths for testing: with a preexisting, published pid and one without so that the system generates one. The subject error happens when you use the ddi export from a published dataset on another system, ie. demo. The failure for a ddi without a doi is less clear to me: {"status":"ERROR","message":"Invalid file content: ParseError at [row,col]:[17,6]\nMessage: Element content can not contain child START_ELEMENT when using Typed Access methods"}
The text was updated successfully, but these errors were encountered: