-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a javascript for the frontend that supports Fundref #9150
Comments
Jim suggests the following as a rough breakdown on a backlog for the backend:
Jim's notes on limitations that the current impelmentations have
Working From the users's point of view
|
priority discussion with Stefano;
|
Top priority for upcoming sprint |
sizing: Size as a 33. |
@pdurbin, for usability testing the workflow with the javascript, would it be possible to either create a test server or to use Demo Dataverse? If we need to add new fields or think it would be better to, would it be easier to create a test server, like spinning something up on AWS? To figure out who best to contact for testing, I'm reviewing metadata in the Harvard repository to find users who most often add funding information. While I'm doing that it makes sense to look at the extent of #4859 (how many users enter funding metadata in both fields). Might be possible to learn more about that, too. |
@jggautier sure, a demo server sounds good. Here's a quick dump from the first entry (NSF) from the first API endpoint listed at https://www.fundref.org/documentation/funder-registry/funder-data-via-the-api/
That DOI for NSF ( http://dx.doi.org/10.13039/100000001 ) redirects to the HTTPS version ( https://dx.doi.org/10.13039/100000001 ) which redirects to http://data.crossref.org/fundingdata/funder/10.13039/100000001 where even more info is shown:
Given the existing fields... ... I assume we'd fill them in like this: That is: "name": "National Science Foundation", Or is the uri (from the first JSON, to avoid making the second call) better? That is: "name": "National Science Foundation", I realize this is http rather than https, but that's ok. Like I said above, it redirects. |
+1 for URI - 100000001 could be anyone's numbering system. The JavaScript in view mode can hide the URI form if desired - either just showing the number or even showing just one field with National Science Foundation - as a link - so what's best to display can be a separate issue from what to store. (Sadly while all the variants of http(s)://(dx.)doi.org/ all redirect to the right place, they do give you four variants of the URI to use as the identifier. Using what they say (unless DataCite has a preference?) is probably best practice although https://doi.org/10.13039/100000001 is the most modern at this point.) |
Wait, the old name for the field (from #4859) was "Grant Number")... ... I think the popup text confused me. I need to dig more into these fields, obviously. |
Great catch and sorry for the confusion! The second child field, Identifier, should be the identifier of the funding (like the grant), not of the funder. The popup used to be:
When we tried to improve the popups last year, we changed it to:
So it sounds like the rewritten popup text and maybe the new field names make others think that they should enter the ID of the funder (instead of the expected identifier of the funding). Is that right? I think a new issue should be opened to address this, involving seeing how most people have interpreted the new field name and popup text and how they could be improved. |
@jggautier sure, a new issue sounds fine. Honestly, simply changing "of" to "from" would help me a lot. Like this:
|
Makes sense to me. I'll open an issue about it. About this issue, from #7285 (comment), it sounds like I should wait to hear from @mreekie about finding a time to discuss, and maybe you could join us? I think the scope might change once we're all the same page about what we expect depositors to enter in the current fields. In the meantime it might be helpful to know that when depositors have filled the second child field, Identifier, it looks like it's always a funding ID, never the ID of the funder. So there's no evidence yet of anyone else misinterpreting that field, although I've only looked at datasets from research funded by the NIH (see Google Sheet with funding metadata from NIH-funded data). Although I still agree that changing "of" to "from" seems clearer to me, too. |
Just to keep things clear and updated, when someone chooses a funder from the Funding Information Agency dropdown list that this javascript produces from the CrossRef API, the funder's identifier should not go in the Funding Information Identifier field. Instead, the Funding Information Identifier field (the second child field) is for the award ID.
|
waiting right now on some input from the ROR work. #9151 |
I think it's important to publicize the use-cases that this work will support. This will help with evaluation/testing. The NIH GREI group is interested in helping funders find the data produced by the research they've funded, specifically NIH funded research. This is one of the use-cases they want each repository of the GREI group to support. And this javascript supports that goal by helping the Harvard Dataverse more consistently record funding agency names and record persistent identifiers for those agencies. Because a lot of the funding agency name metadata in the Harvard Dataverse looks like this today... ... some funders need to do more complex searches. There are more than 57 datasets whose metadata says that the data is created from NIH-funded research, but only 57 have the text "NIH" in the Funding Information Agency field. During the GREI training workshops on Jan 24 and 25, we'll have a chance to learn about funders' experiences with finding data from the research they fund. And when it's ready I plan to review this new javascript with users or potential users of the repository who would or often add funding metadata to their datasets. |
We'll eventually want to send this metadata to DataCite, so I took a look at the funding metadata that's already included in the OpenAIRE export. The OpenAIRE schema is based on the DataCite schema so members of the Dataverse community have been looking at the design decisions there to inform how best to send metadata to DataCite. And I looked at examples that DataCite provides for sending funding metadata. How funding metadata is included in Dataverse's OpenAIRE export:
The first The second Sending funder identifier metadata to DataCite To add the funder identifier, which the javascript would grab from the Crossref API when a depositor chooses a funder from the suggested list, we could use DataCite's
So when depositors entered a funding name and award number, what's sent to DataCite (and included in the OpenAIRE export) would look like this:
This suggests that DataCite prefers the DOI, such as http://dx.doi.org/10.13039/100000001 from the example in an earlier comment, instead of the ID number "100000001". And when So Dataverse will need to know that if the depositor chooses a funder from the suggested list that the javascript shows, the funder identifier is a Crossref Funder ID, and that it can include that in the If we add a new funder ID field to the Citation metadatablock, if the depositor types in their own funder name, instead of choosing from the suggested list from the Crossref API, and if she enters a funder ID, Dataverse could use the "Other" value. Or could we add a Funder Identifier Type field so the depositor could choose from the list? This all depends on what's possible with the javascript and design decisions we make when using it for the funding metadata. And of course considering how to send this metadata to DataCite could be postponed and a new GitHub issue could be added to tackle it later. I just wanted to mention that sending this metadata to DataCite is a goal of the NIH GREI group in case there's something we can do to prepare for it earlier. When folks in the NIH GREI meetings ask about being able to use one interface to search for datasets from many repositories, DataCite Commons is mentioned as a way to do this, and that works only when DataCite is sent metadata from the different repositories. |
resizing for next sprint:
|
Sprint resizing There are 2 streams of work:
|
Thought I'd try summarizing some the uncertainty as I understand it:
@mreekie you wrote in the first comment that the definition of done includes "Code, javascript and a test metadatablock in the other repo". That's so that we can evaluate it right? I've been writing that I plan to test it with users (particularly users who often add funding metadata to their datasets). But I think right now we're trying to resolve what we know would be issues if this was tested with users right now. |
FWIW: Names should appear if there is a facet and in advanced search. These, and I think basic search as well, depend on the expandedValue that @pdurbin hasn't yet completed. Once that's done, I think these will be findable even by any i18n names they have. |
Thanks @qqmyers. Then I'll hold off on reviewing and asking more questions. |
I unassigned myself and haven't been actively working on this. Anyone else is welcome to pick it up. I'm happy to give a brain dump. Some scratch work on the Dataverse side: pdurbin@ba4575a Some scratch work on the external vocab side: gdcc/dataverse-external-vocab-support@main...pdurbin:dataverse-external-vocab-support:scratch3 |
With #9402 and gdcc/dataverse-external-vocab-support#14 , there's basic support for using the CrossRef funder registry with the grantNumberAgency field and the Research Organization Registry with the authorAffiliation field. As is, they work ~ like the original ORCID script allowing selection from a list of choices and displaying that choice correctly in the various parts of the display (see#9150), and capture/provide additional info in the Json and OAI-ORE exports. Both sort results to prioritize entries where the entry is 'active' (ROR only), where typed text matches the acronym or relevant tokens (fundreg only), and, lastly if the entry has been used by that user before. This means that NIH appears near the top in both fields to start and, once the user has selected the real NIH entry once, it is always at the top. (Same if the once select a bad one, until they delete the browser's localStore cache.) Values are truncated to fit the narrow inputs allowed for child fields. The fundreg script sends a mailto: address so that CrossRef will the requests on their priority queue. Both use cached values rather than pinging the services every time a term is displayed. That said, there are a variety of issues that could be addressed now or later:
|
Also FWIW - I think this means that the current ORCID script, possibly with some of the cleanup/improvements applied to the fundreg/ror scripts) could be applied to the author child field now. (It was designed to work with a primitive field and/or to fill in all children of a parent field and may or may not have worked to just fill in the author name child field in the citation block.) |
Daily
|
Definition of Done:
(i.e. not production )
The text was updated successfully, but these errors were encountered: