Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve DOI more cleanly #1388

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions repo2docker/contentproviders/doi.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,9 @@ def doi2url(self, doi):
doi = normalize_doi(doi)

try:
resp = self._request(f"https://doi.org/{doi}")
# We don't need to fetch the *contents* of the page the doi resolves to -
# only need to know what it redirects to.
resp = self._request(f"https://doi.org/{doi}", allow_redirects=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any rules on whether a DOI can redirect multiple times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@manics great question that led me down a rabbithole! I'm now convinced this approach is not robust enough, and will open a different series of PRs - starting with #1389, and #1390 is a WIP

resp.raise_for_status()
except HTTPError as e:
# If the DOI doesn't exist, just return URL
Expand All @@ -60,7 +62,7 @@ def doi2url(self, doi):
# default Git provider as this leads to a misleading error.
self.log.error(f"DOI {doi} does not resolve: {e}")
raise
return resp.url
return resp.headers["Location"]
else:
# Just return what is actulally just a URL
return doi
Expand Down
4 changes: 2 additions & 2 deletions tests/unit/contentproviders/test_dataverse.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,8 +70,8 @@ def doi_resolver(req, context):
assert requests_mock.call_count == 0
# valid Dataverse DOIs trigger this content provider
assert Dataverse().detect(test_input[0]) == expected[0]
# 4: doi resolution (302), File, doi resolution (302), then dataset
assert requests_mock.call_count == 4
# 4: doi resolution (302), doi resolution (302)
assert requests_mock.call_count == 2
requests_mock.reset_mock()

assert Dataverse().detect(test_input[1]) == expected[0]
Expand Down
2 changes: 1 addition & 1 deletion tests/unit/contentproviders/test_doi.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ def test_url_headers(requests_mock):
@pytest.mark.parametrize(
"requested_doi, expected",
[
("10.5281/zenodo.3242074", "https://zenodo.org/records/3242074"),
("10.5281/zenodo.3242074", "https://zenodo.org/record/3242074"),
# Unresolving DOI:
("10.1/1234", "10.1/1234"),
],
Expand Down
4 changes: 2 additions & 2 deletions tests/unit/contentproviders/test_hydroshare.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,8 +75,8 @@ def test_detect_hydroshare(requests_mock):
assert (
Hydroshare().detect("10.4211/hs.b8f6eae9d89241cf8b5904033460af61") == expected
)
# assert 3 calls were made, 2 to resolve the DOI (302 + 200) and another to fetch the version
assert requests_mock.call_count == 3
# assert 2 calls were made, 1 to resolve the DOI (302) and another to fetch the version
assert requests_mock.call_count == 2
requests_mock.reset_mock()

assert (
Expand Down
4 changes: 2 additions & 2 deletions tests/unit/contentproviders/test_zenodo.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ def test_detect_zenodo(test_input, expected, requests_mock):
assert Zenodo().detect(test_input[0]) == expected
assert Zenodo().detect(test_input[1]) == expected
assert Zenodo().detect(test_input[2]) == expected
# only two of the three calls above have to resolve a DOI (2 req per doi resolution)
assert requests_mock.call_count == 4
# only two of the three calls above have to resolve a DOI (1 req per doi resolution)
assert requests_mock.call_count == 2
requests_mock.reset_mock()

# Don't trigger the Zenodo content provider
Expand Down
Loading