Skip to content

v0.11

Compare
Choose a tag to compare
@J535D165 J535D165 released this 26 Mar 22:15
· 20 commits to main since this release
4935c64

What's Changed

  • implement checksum checking as feature of datahugger by @davetromp in #72

Full Changelog: v0.10.4...v0.11

Coverage report

The following benchmark was applied to 1000 randomly selected records from Datacite.

Percentages

Percentage of datasets supported: 26.9%
Percentage of datasets not supported: 69.4%
Percentage of datasets with error: 3.7%

Table with unexpected errors

id type url service error
45 10.17188/1264410 dois http://www.osti.gov/servlets/purl/1264410/ nan HTTPSConnectionPool(host='www.osti.gov', port=443): Max retries exceeded with url: /servlets/purl/1264410/ (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41f9cad0>, 'Connection to www.osti.gov timed out. (connect timeout=3)'))
47 10.58100/ibcr0302rx67ws2 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=21038&SAM=IBCR0302RX67WS2
52 10.18730/v7c2= dois https://glis.fao.org/glis/doi/10.18730/V7C2= nan '10.18730/v7c2=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
60 10.3929/ethz-a-010147993 dois http://hdl.handle.net/20.500.11850/83547 nan 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/83547
73 10.20345/digitue.1029.61 dois http://idb.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141 nan 500 Server Error: Internal Server Error for url: https://opendigi.ub.uni-tuebingen.de/opendigi/litrdsch_1902#p=141
96 10.17876/plate/dr.2/plates/201_33742 dois https://www.plate-archive.org/objects/dr.2/plates/201_33742 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_33742/
119 10.18430/m3.irrmc.4168 dois https://proteindiffraction.org/project/SETDB1-x122 nan 'NoneType' object has no attribute 'find'
129 10.58100/ibcr0310rxocku2 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=22618&SAM=IBCR0310RXOCKU2
133 10.14469/ch/8676 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-8701 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
252 10.15781/t2xs4w dois https://repositories.lib.utexas.edu/handle/2152/31647 nan HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/31647 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41d4a3c0>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)'))
362 10.14469/ch/1303 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/to-1328 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
367 10.18720/spbpu/2/v18-6126 dois http://elib.spbstu.ru/dl/2/v18-6126.pdf nan HTTPSConnectionPool(host='elib.spbstu.ru', port=443): Read timed out. (read timeout=10)
383 10.14456/scitechasia.2022.12 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/scitechasia.2022.12 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
397 10.17876/plate/dr.2/envelopes/201_50873 dois https://www.plate-archive.org/objects/dr.2/envelopes/201_50873 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/envelopes/201_50873/
400 10.23725/akhp-6959 dois https://ors.datacite.org/doi:/10.23725/akhp-6959 nan HTTPSConnectionPool(host='ors.datacite.org', port=443): Max retries exceeded with url: /doi:/10.23725/akhp-6959 (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x7faf4e527d10>: Failed to resolve 'ors.datacite.org' ([Errno -2] Name or service not known)"))
403 10.58100/ibcr0381exz5001 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26882&SAM=IBCR0381EXZ5001 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26882&SAM=IBCR0381EXZ5001
434 10.58100/ibcr0364exxoa01 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26567&SAM=IBCR0364EXXOA01 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26567&SAM=IBCR0364EXXOA01
452 10.14469/ch/129258 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/134211 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
458 10.14469/ch/41814 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/48213 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
464 10.3929/ethz-b-000581366 dois http://hdl.handle.net/20.500.11850/581366 nan 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/581366
483 10.18730/12n7m$ dois https://glis.fao.org/glis/doi/10.18730/12N7M$ nan '10.18730/12n7m$' is not a correct resource identifier (e.g. a URL, DOI, Handle)
496 10.14457/cmu.the.2009.132 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14457/CMU.the.2009.132 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
501 10.14456/stj.2019.4 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/stj.2019.4 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/stj.2019.4 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
503 10.14457/kmutt.res.2010.25 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/KMUTT.res.2010.25 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14457/KMUTT.res.2010.25 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
505 10.14469/ch/175982 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/180406 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
515 10.15781/t2c824g2w dois https://repositories.lib.utexas.edu/handle/2152/41169 nan HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/41169 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf4e82d130>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)'))
551 10.17876/plate/dr.2/plates/201_35722 dois https://www.plate-archive.org/objects/dr.2/plates/201_35722 nan 500 Server Error: Internal Server Error for url: https://www.plate-archive.org/objects/dr.2/plates/201_35722/
557 10.14457/mu.the.1999.140 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14457/MU.the.1999.140 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14457/MU.the.1999.140 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
568 10.3929/ethz-b-000406136 dois http://hdl.handle.net/20.500.11850/406136 nan 429 Client Error: Too Many Requests for url: https://www.research-collection.ethz.ch/handle/20.500.11850/406136
639 10.58100/ibcr0364exf0601 dois http://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26688&SAM=IBCR0364EXF0601 nan 503 Server Error: Service Unavailable for url: https://dis.iodp.pangaea.de/BCRDIS/webview/CORES_INFO.aspx?SKEY=26688&SAM=IBCR0364EXF0601
683 10.20379/dbaud-1041 dois http://webdatenbank.grass-medienarchiv.de/receive/ggrass_mods_00001019 nan 503 Server Error: Service Unavailable for url: https://webdatenbank.grass-medienarchiv.de/receive/ggrass_mods_00001019
757 10.18730/q3s0= dois https://glis.fao.org/glis/doi/10.18730/Q3S0= nan '10.18730/q3s0=' is not a correct resource identifier (e.g. a URL, DOI, Handle)
761 10.15781/t2v698x4n dois https://repositories.lib.utexas.edu/handle/2152/68802 nan HTTPSConnectionPool(host='repositories.lib.utexas.edu', port=443): Max retries exceeded with url: /handle/2152/68802 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf4e48eb70>, 'Connection to repositories.lib.utexas.edu timed out. (connect timeout=3)'))
782 10.20372/nadre:1554185535.13 dois https://nadre.ethernet.edu.et/record/3238?ln=en nan HTTPSConnectionPool(host='nadre.ethernet.edu.et', port=443): Max retries exceeded with url: /record/3238?ln=en (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1000)')))
816 10.14469/ch/90617 dois https://spectradspace.lib.imperial.ac.uk:8443/dspace/handle/10042/97675 nan HTTPSConnectionPool(host='spectradspace.lib.imperial.ac.uk', port=8443): Read timed out. (read timeout=3)
821 10.14456/apsr.2022.3 dois http://doi.nrct.go.th/?page=resolve_doi&resolve_doi=10.14456/apsr.2022.3 nan HTTPSConnectionPool(host='doi.nrct.go.th', port=443): Max retries exceeded with url: /?page=resolve_doi&resolve_doi=10.14456/apsr.2022.3 (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1000)')))
894 10.5287/bodleianjpcy.2 dois https://databank.ora.ox.ac.uk/ww1archives/datasets/ww1-3945?version=2 nan HTTPSConnectionPool(host='databank.ora.ox.ac.uk', port=443): Max retries exceeded with url: /ww1archives/datasets/ww1-3945?version=2 (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7faf41f33440>, 'Connection to databank.ora.ox.ac.uk timed out. (connect timeout=3)'))

Table with unsupported repositories

netloc count
pid.geoscience.gov.au 103
app.geosamples.org 79
doi.plutof.ut.ee 60
www.gbif.org 57
glis.fao.org 30
www.e-periodica.ch 26
ba.e-pics.ethz.ch 22
dlc.library.columbia.edu 19
bacdive.dsmz.de 18
rgdoi.net 16
digitallibrary.usc.edu 14
www.ccdc.cam.ac.uk 14
www.lfi.ch 11
nakala.fr 9
www.osti.gov 8
catalog.paradisec.org.au 8
doi.library.ubc.ca 7
digital.ucd.ie 7
www.plate-archive.org 7
doi.nrct.go.th 6
ntnu.tind.io 6
architekturmuseum.ub.tu-berlin.de 6
spectradspace.lib.imperial.ac.uk:8443 6
www.die-bonn.de 6
straininfo.dsmz.de 5
digi.ub.uni-heidelberg.de 5
dis.iodp.pangaea.de 5
publikationen.bibliothek.kit.edu 5
dadosdepesquisa.fiocruz.br 5
data.neotomadb.org 4
www.rvdata.us 4
hdl.handle.net 4
era.library.ualberta.ca 4
repositories.lib.utexas.edu 3
repository.edition-topoi.org 3
sage.figshare.com 3
sr.ethz.ch 3
www.boldsystems.org 3
www.hepdata.net 3
ageconsearch.umn.edu 3
journals.ub.uni-heidelberg.de 3
apex.ipk-gatersleben.de 3
doi.ala.org.au 3
statisticaldatasets.data-planet.com 3
epos.myesr.org 3
core.tdar.org 2
ikee.lib.auth.gr 2
cocoon.huma-num.fr 2
d.lib.msu.edu 2
pqr.pitt.edu 2
biosys.e-pics.ethz.ch 2
doi.roper.center 2
147.156.5.176:8080 2
viurrspace.ca 2
cyberleninka.ru 2
classiques-garnier.com 2
gdac.broadinstitute.org 2
hasp.ub.uni-heidelberg.de 2
springernature.figshare.com 2
www.e-gs.ethz.ch 2
scholarworks.wm.edu 2
www.e-manuscripta.ch 2
bib-pubdb1.desy.de 2
search.rads-doi.org 2
b2share.eudat.eu 1
ascomycete.org 1
dataverse.callisto.calmip.univ-toulouse.fr 1
esdcdoi.esac.esa.int 1
deepblue.lib.umich.edu 1
resolver.caltech.edu 1
figshare.com 1
ojs.utlib.ee 1
cdr.lib.unc.edu 1
www.repository.cam.ac.uk 1
www.openagrar.de 1
www.crd.york.ac.uk 1
dlc.mpg.de 1
www.bindingdb.org 1
tecnoscienza.unibo.it 1
epub.uni-regensburg.de 1
databank.ora.ox.ac.uk 1
data.oceannetworks.ca 1
ad.e-pics.ethz.ch 1
nadre.ethernet.edu.et 1
qatest.labarchives.com 1
www.openaccessrepository.it 1
ap.elte.hu 1
depositonce.tu-berlin.de 1
mdsoar.org 1
resume.uni.lu 1
webdatenbank.grass-medienarchiv.de 1
rockstore.csiro.au 1
rucore.libraries.rutgers.edu 1
archiv.ub.uni-heidelberg.de 1
nsidc.org 1
www.icpsr.umich.edu 1
archiviostorico.fondazione1563.it 1
daac.ornl.gov 1
spiral.imperial.ac.uk 1
www.tib.eu 1
doi.ciser.cornell.edu 1
academiccommons.columbia.edu 1
journals.open.tudelft.nl 1
tuprints.ulb.tu-darmstadt.de 1
www.archaeolog.ru 1
proteindiffraction.org 1
www.sozialpolitik.ch 1
data.caltech.edu 1
idb.ub.uni-tuebingen.de 1
publica.fraunhofer.de 1
bl.iro.bl.uk 1
ads.nipr.ac.jp 1
www.psycharchives.org 1
underline.io 1
resolver.tudelft.nl 1
opus.bibliothek.uni-wuerzburg.de 1
cyberdoi.ru 1
ruor.uottawa.ca 1
encyclopedia.1914-1918-online.net 1
theses.gla.ac.uk 1
www.jamstec.go.jp 1
drops.dagstuhl.de 1
dataservices.gfz-potsdam.de 1
boris.unibe.ch 1
ors.datacite.org 1
www.e-rara.ch 1
www.elibrary.ru 1
elib.spbstu.ru 1
www.zora.uzh.ch 1
campagnes.flotteoceanographique.fr 1
archive.materialscloud.org 1
www.worldpop.org.uk 1
archaeologydataservice.ac.uk 1
didomena.ehess.fr 1
cwm-archiv.gbv.de 1