v0.11
What's Changed
- implement checksum checking as feature of datahugger by @davetromp in #72
Full Changelog: v0.10.4...v0.11
Coverage report
The following benchmark was applied to 1000 randomly selected records from Datacite.
Percentages
Percentage of datasets supported: 26.9%
Percentage of datasets not supported: 69.4%
Percentage of datasets with error: 3.7%
Table with unexpected errors
id | type | url | service | error | |
---|---|---|---|---|---|
45 | 10.17188/1264410 | dois | http://www.osti.gov/servlets/purl/1264410/ | nan | HTTPSConnectionPool(host='www.osti.gov', port=443): Max retries exceeded with url: /servlets/purl/1264410/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41f9cad0>, 'Connection to www.osti.gov timed out. (connect timeout=3)')) |
47 | 10.58100/ibcr0302rx67ws2 | dois | http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2 | nan | 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2 |
52 | 10.18730/v7c2= | dois | https://glis.fao.org/glis/doi/10.18730/V7C2= | nan | '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle) |
60 | 10.3929/ethz-a-010147993 | dois | http://hdl.handle.net/20.500.11850/83547 | nan | 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/83547 |
73 | 10.20345/digitue.1029.61 | dois | http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 | nan | 500 Server Error: Internal Server Error for url: https://opendigi.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 |
96 | 10.17876/plate/dr.2/plates/201_33742 | dois | https://www.plate-archive.org/objects/dr.2/plates/201_33742 | nan | 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/ |
119 | 10.18430/m3.irrmc.4168 | dois | https://proteindiffraction.org/project/SETDB1-x122 | nan | 'NoneType' object has no attribute 'find' |
129 | 10.58100/ibcr0310rxocku2 | dois | http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2 | nan | 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2 |
133 | 10.14469/ch/8676 | dois | https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-8701 | nan | HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) |
252 | 10.15781/t2xs4w | dois | https://repositories.lib.utexas.edu/handle/2152/31647 | nan | HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/31647 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41d4a3c0>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)')) |
362 | 10.14469/ch/1303 | dois | https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-1328 | nan | HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) |
367 | 10.18720/spbpu/2/v18-6126 | dois | http://elib.spbstu.ru/dl/2/v18-6126.pdf | nan | HTTPSConnectionPool(host='elib.spbstu.ru', port=443): Read timed out. (read timeout=10) |
383 | 10.14456/scitechasia.2022.12 | dois | http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 | nan | HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))) |
397 | 10.17876/plate/dr.2/envelopes/201_50873 | dois | https://www.plate-archive.org/objects/dr.2/envelopes/201_50873 | nan | 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/envelopes/201_50873/ |
400 | 10.23725/akhp-6959 | dois | https://ors.datacite.org/doi:/10.23725/akhp-6959 | nan | HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7faf4e527d10>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)")) |
403 | 10.58100/ibcr0381exz5001 | dois | http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26882&SAM=IBCR0381EXZ5001 | nan | 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26882&SAM=IBCR0381EXZ5001 |
434 | 10.58100/ibcr0364exxoa01 | dois | http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26567&SAM=IBCR0364EXXOA01 | nan | 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26567&SAM=IBCR0364EXXOA01 |
452 | 10.14469/ch/129258 | dois | https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 | nan | HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) |
458 | 10.14469/ch/41814 | dois | https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 | nan | HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) |
464 | 10.3929/ethz-b-000581366 | dois | http://hdl.handle.net/20.500.11850/581366 | nan | 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/581366 |
483 | 10.18730/12n7m$ | dois | https://glis.fao.org/glis/doi/10.18730/12N7M$ | nan | '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle) |
496 | 10.14457/cmu.the.2009.132 | dois | http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 | nan | HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))) |
501 | 10.14456/stj.2019.4 | dois | http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/stj.2019.4 | nan | HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/stj.2019.4 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))) |
503 | 10.14457/kmutt.res.2010.25 | dois | http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/KMUTT.res.2010.25 | nan | HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14457/KMUTT.res.2010.25 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))) |
505 | 10.14469/ch/175982 | dois | https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/180406 | nan | HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) |
515 | 10.15781/t2c824g2w | dois | https://repositories.lib.utexas.edu/handle/2152/41169 | nan | HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/41169 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf4e82d130>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)')) |
551 | 10.17876/plate/dr.2/plates/201_35722 | dois | https://www.plate-archive.org/objects/dr.2/plates/201_35722 | nan | 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_35722/ |
557 | 10.14457/mu.the.1999.140 | dois | http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/MU.the.1999.140 | nan | HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14457/MU.the.1999.140 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))) |
568 | 10.3929/ethz-b-000406136 | dois | http://hdl.handle.net/20.500.11850/406136 | nan | 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/406136 |
639 | 10.58100/ibcr0364exf0601 | dois | http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26688&SAM=IBCR0364EXF0601 | nan | 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26688&SAM=IBCR0364EXF0601 |
683 | 10.20379/dbaud-1041 | dois | http://webdatenbank.grass-medienarchiv.de/receive/ggrass_mods_00001019 | nan | 503 Server Error: Service Unavailable for url: https://webdatenbank.grass-medienarchiv.de/receive/ggrass_mods_00001019 |
757 | 10.18730/q3s0= | dois | https://glis.fao.org/glis/doi/10.18730/Q3S0= | nan | '10.18730/q3s0=' is not a correct resource identifier (e.g. a URL, DOI, Handle) |
761 | 10.15781/t2v698x4n | dois | https://repositories.lib.utexas.edu/handle/2152/68802 | nan | HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/68802 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf4e48eb70>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)')) |
782 | 10.20372/nadre:1554185535.13 | dois | https://nadre.ethernet.edu.et/record/3238?ln=en | nan | HTTPSConnectionPool(host='nadre.ethernet.edu.et', port=443): Max retries exceeded with url: /record/3238?ln=en (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)'))) |
816 | 10.14469/ch/90617 | dois | https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/97675 | nan | HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3) |
821 | 10.14456/apsr.2022.3 | dois | http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/apsr.2022.3 | nan | HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/apsr.2022.3 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)'))) |
894 | 10.5287/bodleianjpcy.2 | dois | https://databank.ora.ox.ac.uk/ww1archives/datasets/ww1-3945?version=2 | nan | HTTPSConnectionPool(host='databank.ora.ox.ac.uk', port=443): Max retries exceeded with url: /ww1archives/datasets/ww1-3945?version=2 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41f33440>, 'Connection to databank.ora.ox.ac.uk timed out. (connect timeout=3)')) |
Table with unsupported repositories
netloc | count |
---|---|
pid.geoscience.gov.au | 103 |
app.geosamples.org | 79 |
doi.plutof.ut.ee | 60 |
www.gbif.org | 57 |
glis.fao.org | 30 |
www.e-periodica.ch | 26 |
ba.e-pics.ethz.ch | 22 |
dlc.library.columbia.edu | 19 |
bacdive.dsmz.de | 18 |
rgdoi.net | 16 |
digitallibrary.usc.edu | 14 |
www.ccdc.cam.ac.uk | 14 |
www.lfi.ch | 11 |
nakala.fr | 9 |
www.osti.gov | 8 |
catalog.paradisec.org.au | 8 |
doi.library.ubc.ca | 7 |
digital.ucd.ie | 7 |
www.plate-archive.org | 7 |
doi.nrct.go.th | 6 |
ntnu.tind.io | 6 |
architekturmuseum.ub.tu-berlin.de | 6 |
spectradspace.lib.imperial.ac.uk:8443 | 6 |
www.die-bonn.de | 6 |
straininfo.dsmz.de | 5 |
digi.ub.uni-heidelberg.de | 5 |
dis.iodp.pangaea.de | 5 |
publikationen.bibliothek.kit.edu | 5 |
dadosdepesquisa.fiocruz.br | 5 |
data.neotomadb.org | 4 |
www.rvdata.us | 4 |
hdl.handle.net | 4 |
era.library.ualberta.ca | 4 |
repositories.lib.utexas.edu | 3 |
repository.edition-topoi.org | 3 |
sage.figshare.com | 3 |
sr.ethz.ch | 3 |
www.boldsystems.org | 3 |
www.hepdata.net | 3 |
ageconsearch.umn.edu | 3 |
journals.ub.uni-heidelberg.de | 3 |
apex.ipk-gatersleben.de | 3 |
doi.ala.org.au | 3 |
statisticaldatasets.data-planet.com | 3 |
epos.myesr.org | 3 |
core.tdar.org | 2 |
ikee.lib.auth.gr | 2 |
cocoon.huma-num.fr | 2 |
d.lib.msu.edu | 2 |
pqr.pitt.edu | 2 |
biosys.e-pics.ethz.ch | 2 |
doi.roper.center | 2 |
147.156.5.176:8080 | 2 |
viurrspace.ca | 2 |
cyberleninka.ru | 2 |
classiques-garnier.com | 2 |
gdac.broadinstitute.org | 2 |
hasp.ub.uni-heidelberg.de | 2 |
springernature.figshare.com | 2 |
www.e-gs.ethz.ch | 2 |
scholarworks.wm.edu | 2 |
www.e-manuscripta.ch | 2 |
bib-pubdb1.desy.de | 2 |
search.rads-doi.org | 2 |
b2share.eudat.eu | 1 |
ascomycete.org | 1 |
dataverse.callisto.calmip.univ-toulouse.fr | 1 |
esdcdoi.esac.esa.int | 1 |
deepblue.lib.umich.edu | 1 |
resolver.caltech.edu | 1 |
figshare.com | 1 |
ojs.utlib.ee | 1 |
cdr.lib.unc.edu | 1 |
www.repository.cam.ac.uk | 1 |
www.openagrar.de | 1 |
www.crd.york.ac.uk | 1 |
dlc.mpg.de | 1 |
www.bindingdb.org | 1 |
tecnoscienza.unibo.it | 1 |
epub.uni-regensburg.de | 1 |
databank.ora.ox.ac.uk | 1 |
data.oceannetworks.ca | 1 |
ad.e-pics.ethz.ch | 1 |
nadre.ethernet.edu.et | 1 |
qatest.labarchives.com | 1 |
www.openaccessrepository.it | 1 |
ap.elte.hu | 1 |
depositonce.tu-berlin.de | 1 |
mdsoar.org | 1 |
resume.uni.lu | 1 |
webdatenbank.grass-medienarchiv.de | 1 |
rockstore.csiro.au | 1 |
rucore.libraries.rutgers.edu | 1 |
archiv.ub.uni-heidelberg.de | 1 |
nsidc.org | 1 |
www.icpsr.umich.edu | 1 |
archiviostorico.fondazione1563.it | 1 |
daac.ornl.gov | 1 |
spiral.imperial.ac.uk | 1 |
www.tib.eu | 1 |
doi.ciser.cornell.edu | 1 |
academiccommons.columbia.edu | 1 |
journals.open.tudelft.nl | 1 |
tuprints.ulb.tu-darmstadt.de | 1 |
www.archaeolog.ru | 1 |
proteindiffraction.org | 1 |
www.sozialpolitik.ch | 1 |
data.caltech.edu | 1 |
idb.ub.uni-tuebingen.de | 1 |
publica.fraunhofer.de | 1 |
bl.iro.bl.uk | 1 |
ads.nipr.ac.jp | 1 |
www.psycharchives.org | 1 |
underline.io | 1 |
resolver.tudelft.nl | 1 |
opus.bibliothek.uni-wuerzburg.de | 1 |
cyberdoi.ru | 1 |
ruor.uottawa.ca | 1 |
encyclopedia.1914-1918-online.net | 1 |
theses.gla.ac.uk | 1 |
www.jamstec.go.jp | 1 |
drops.dagstuhl.de | 1 |
dataservices.gfz-potsdam.de | 1 |
boris.unibe.ch | 1 |
ors.datacite.org | 1 |
www.e-rara.ch | 1 |
www.elibrary.ru | 1 |
elib.spbstu.ru | 1 |
www.zora.uzh.ch | 1 |
campagnes.flotteoceanographique.fr | 1 |
archive.materialscloud.org | 1 |
www.worldpop.org.uk | 1 |
archaeologydataservice.ac.uk | 1 |
didomena.ehess.fr | 1 |
cwm-archiv.gbv.de | 1 |