Simplify Dataverse Content Provider to only operate on datasets #7

pdurbin · 2024-12-16T15:43:28Z

When the Dataverse content provider was added in jupyterhub#739 it had the flexibility to operate directly on Dataverse files like this:

repo2docker https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/6ZXAGT/3YRRYJ

However, being able to operate only on datasets (files are stored in datasets in Dataverse) is enough. That is, this will still work:

repo2docker doi:10.7910/DVN/TJCLKP

And that's all we need.

This simplification builds upon the work in jupyterhub#1388 where the content of the dataset landing page is not retrieved from the DOI of the dataset. Instead, the redirect location is fetched, which is all the Dataverse content provider needs to determine which of the 100+ installations of Dataverse hosts the DOI.

This change should be a no-op for any installation of Datavese with Binder integration enabled.

Harvard Dataverse (one of the 100+ installations) specifically is not working with Binder due to a firewall that is blocking https://dataverse.harvard.edu/citation
The simplification in this commit means that the Dataverse content provider no longer needs to follow /citation to determine what is on the other side (dataset.xhtml, file.xhtml, etc.). It assumes that the DOI is always for a dataset (not a file), which is the expectation we have always set for the Binder tool.

We are tracking Binder not working with Harvard Dataverse here: IQSS/dataverse.harvard.edu#328

…1388 When the Dataverse content provider was added in jupyterhub#739 it had the flexibility to operate directly on Dataverse files like this: repo2docker https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/6ZXAGT/3YRRYJ However, being able to operate only on datasets (files are stored in datasets in Dataverse) is enough. That is, this will still work: repo2docker doi:10.7910/DVN/TJCLKP And that's all we need. This simplification builds upon the work in jupyterhub#1388 where the content of the dataset landing page is not retrieved from the DOI of the dataset. Instead, the redirect location is fetched, which is all the Dataverse content provider needs to determine which of the 100+ installations of Dataverse hosts the DOI. This change should be a no-op for any installation of Datavese with Binder integration enabled. Harvard Dataverse (one of the 100+ installations) specifically is not working with Binder due to a firewall that is blocking https://dataverse.harvard.edu/citation The simplification in this commit means that the Dataverse content provider no longer needs to follow `/citation` to determine what is on the other side (dataset.xhtml, file.xhtml, etc.). It assumes that the DOI is always for a dataset (not a file), which is the expectation we have always set for the Binder tool. We are tracking Binder not working with Harvard Dataverse here: IQSS/dataverse.harvard.edu#328

yuvipanda · 2024-12-16T17:47:46Z

Hmm, in jupyterhub#739 - the original PR introducing this, it explicitly talks about using files. So I'd like us to not remove this functionality, as other installations may be relying on this :(

pdurbin · 2024-12-16T19:01:02Z

@yuvipanda I sort of doubt it. The way we advertise Binder in the Dataverse documentation is as a dataset-level tool (scope = dataset): https://guides.dataverse.org/en/6.5/admin/external-tools.html#inventory-of-external-tools

In theory, someone could navigate to https://mybinder.org directly and enter a file-level DOI. But I suspect most people will reach Binder by way of a "Binder" button in Dataverse. That is to say, via a Binder "external tools" as described in the docs above.

yuvipanda · 2024-12-17T20:58:56Z

Handled differently in jupyterhub#1390

pdurbin mentioned this pull request Dec 16, 2024

Binder not working from Harvard Dataverse IQSS/dataverse.harvard.edu#328

Open

yuvipanda mentioned this pull request Dec 17, 2024

Use REST APIs to resolve DOIs + cleanup dataverse provider jupyterhub/repo2docker#1390

Merged

yuvipanda closed this Dec 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify Dataverse Content Provider to only operate on datasets #7

Simplify Dataverse Content Provider to only operate on datasets #7

pdurbin commented Dec 16, 2024

yuvipanda commented Dec 16, 2024

pdurbin commented Dec 16, 2024

yuvipanda commented Dec 17, 2024

Simplify Dataverse Content Provider to only operate on datasets #7

Simplify Dataverse Content Provider to only operate on datasets #7

Conversation

pdurbin commented Dec 16, 2024

yuvipanda commented Dec 16, 2024

pdurbin commented Dec 16, 2024

yuvipanda commented Dec 17, 2024