Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native API: Import ddi dataset does not appear to be working, fails on parsing subject. #8210

Closed
kcondon opened this issue Nov 3, 2021 · 4 comments · Fixed by #8483
Closed
Assignees

Comments

@kcondon
Copy link
Contributor

kcondon commented Nov 3, 2021

There is also an issue opened to provide a working sample ddi in the guides, #8209 .

For now, until proven otherwise, it appears this endpoint isn't working properly. The json endpoint does.

One hypothesis is that a pr that enforced required fields revealed a prior unknown issue in ddi import parsing, showing up as a null subject error.

There are server log errors, only a command line error: {"status":"ERROR","message":"Validation Failed: Subject is required. (Invalid value:edu.harvard.iq.dataverse.DatasetField[ id=null ])."}

Please note there are two paths for testing: with a preexisting, published pid and one without so that the system generates one. The subject error happens when you use the ddi export from a published dataset on another system, ie. demo. The failure for a ddi without a doi is less clear to me: {"status":"ERROR","message":"Invalid file content: ParseError at [row,col]:[17,6]\nMessage: Element content can not contain child START_ELEMENT when using Typed Access methods"}

@qqmyers
Copy link
Member

qqmyers commented Nov 3, 2021

With a quick code look - it looks like the DDI exporter writes the citation block subject and keyword fields into the studyDsc subject element as keyword elements, but the ImportDDIServiceBean just parses all of those elements back in as keywords (and the only line to write out to the subject field in the DTO has been commented out for 7 years:

// citation.getFields().add(FieldDTO.createPrimitiveFieldDTO( "subject",xmlr.getElementText()));
.

I'm not sure what a good solution would be - the imported could potentially scan for values that match the subject controlled vocab choices and import those as subjects, but that could potentially transfer a Keyword entry to the Subject field if there's an accidental match. (As a CVV field, subject now gets an xml:lang attribute if you've enabled i18n support, whereas keywords don't, but that is still basically trying to compensate for the exporter collapsing two citation block fields into one DDI xml element.)

@lubitchv
Copy link
Contributor

lubitchv commented Feb 7, 2022

Hello all,
we have a pilot project to move datasets from Nesstar into Dataverse. The metadata in Nesstar is in DDI format. We are trying to import it with importddi api but run into same error of subject validation. As far as I remember in previous versions of dataverse (5.4?) there was no such validation error. We had other issues such as e-mail and producer logo validation. But these errors can be fixed by editing the ddi xml.

But subject is a different matter since it seems to be the dataverse specific thing (format). The DDI subject has keywords that are translated into dataverse keywords. There are no fields in DDI that would correspond to dataverse subject field. Subject is also from controlled vocabulary.

In this situation any import of DDI will fail and as a result importDDI api function is useless now.

One of the solution would be as @qqmyers mentioned to check keywords for controlled vocabulary and add it to subject list. It will work for us since in our case the subject is always "Social Sciences" and we can alter xmls to include such a keyword. But in general case not all ddi have subject and keywords and in this case import will always fail.

There maybe another way to get around it and introduce additional optional parameter subject in importddi call. If this parameter is present subject can be checked against controlled vocabulary and if there is such in controlled vocabulary then it can be added to citation

citation.getFields().add(FieldDTO.createMultipleVocabFieldDTO("subject", subjectList ));

I would like to know what you think about it?

pdurbin added a commit to lubitchv/dataverse that referenced this issue Mar 8, 2022
@pdurbin
Copy link
Member

pdurbin commented Mar 10, 2022

@lubitchv thanks for pull request #8447. As we've been discussing in Slack, we'll make sure you, @scolapasta and I all agree on the direction before anyone dives back into coding.

Meanwhile, I did a little digging into why "import DDI" stopped working. There was a change in pull request #8088 that reordered two methods below.

Screen Shot 2022-03-10 at 10 24 26 AM

On develop (1487650), I switched the order back and "import DDI" worked. Now, this isn't really a fix because we swapped those methods to fix #6752 but at least we know why and when (Dataverse 5.7) "import DDI" stopped working.

@pdurbin
Copy link
Member

pdurbin commented Mar 10, 2022

@lubitchv @scolapasta and I just discussed the next steps.

At a high level, we want to fix the old importddi API endpoint and encourage using existing JSON-based endpoints to add Subject and other fields as necessary.

Here are some details:

@lubitchv will create a new pull request that does the following:

SWORD rules for Subject and Keyword:

“Subject” uses our controlled vocabulary list of subjects. This list is in the Citation Metadata of our User Guide > Metadata References. Otherwise, if a term does not match our controlled vocabulary list, it will put any subject terms in “Keyword”. If Subject is empty it is automatically populated with “N/A”. -- https://guides.dataverse.org/en/5.9/api/sword.html

@lubitchv please do not hesitate to reach out if anything comes up! Thanks so much for working on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants