-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resolve DOI more cleanly #1388
Closed
Closed
Resolve DOI more cleanly #1388
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Using doi.org, we only care to find out *where* the doi is pointing to. We don't need to go fetch the contents of that page fully.
yuvipanda
force-pushed
the
no-follow-redirect
branch
from
December 14, 2024 16:35
5be6fe3
to
4f287dd
Compare
yuvipanda
force-pushed
the
no-follow-redirect
branch
from
December 14, 2024 16:50
4fb9a03
to
65a0299
Compare
for more information, see https://pre-commit.ci
manics
reviewed
Dec 14, 2024
resp = self._request(f"https://doi.org/{doi}") | ||
# We don't need to fetch the *contents* of the page the doi resolves to - | ||
# only need to know what it redirects to. | ||
resp = self._request(f"https://doi.org/{doi}", allow_redirects=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any rules on whether a DOI can redirect multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pdurbin
added a commit
to IQSS/repo2docker
that referenced
this pull request
Dec 16, 2024
…1388 When the Dataverse content provider was added in jupyterhub#739 it had the flexibility to operate directly on Dataverse files like this: repo2docker https://dataverse.harvard.edu/file.xhtml?persistentId=doi:10.7910/DVN/6ZXAGT/3YRRYJ However, being able to operate only on datasets (files are stored in datasets in Dataverse) is enough. That is, this will still work: repo2docker doi:10.7910/DVN/TJCLKP And that's all we need. This simplification builds upon the work in jupyterhub#1388 where the content of the dataset landing page is not retrieved from the DOI of the dataset. Instead, the redirect location is fetched, which is all the Dataverse content provider needs to determine which of the 100+ installations of Dataverse hosts the DOI. This change should be a no-op for any installation of Datavese with Binder integration enabled. Harvard Dataverse (one of the 100+ installations) specifically is not working with Binder due to a firewall that is blocking https://dataverse.harvard.edu/citation The simplification in this commit means that the Dataverse content provider no longer needs to follow `/citation` to determine what is on the other side (dataset.xhtml, file.xhtml, etc.). It assumes that the DOI is always for a dataset (not a file), which is the expectation we have always set for the Binder tool. We are tracking Binder not working with Harvard Dataverse here: IQSS/dataverse.harvard.edu#328
yuvipanda
added a commit
to yuvipanda/repo2docker
that referenced
this pull request
Dec 16, 2024
As I was debugging jupyterhub#1388, I realized that PR actually broke the dataverse provider, but the existing test was mocking so much that we didn't actually catch it! IMO, since the point of contentproviders is to integrate with external content providers, they should be integration tests so we can catch issues with them more easily. Integration tests would have caught https://jupyter.zulipchat.com/#narrow/channel/103349-ask-anything/topic/Binder.20Dataverse.20error more cleanly than how it happened for example. This PR removes all mocks from the dataverse test, and we immediately benefit - it shows us that the dataverse provider *only* actually handles DOIs, and not direct URLs! So even though we technically had tests earlier that showed our dataverse provider supporting direct dataverse URLs, it simply was not true. So we actually catch the failure. I will try to see if we can use a demo or test instance for the fetch test though, so we don't screw up download stats even more for the existing test doi we use.
Taking a different approach in #1390 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Using doi.org, we only care to find out where the doi is pointing to. We don't need to go fetch the contents of that page fully.
By doing this, we: