-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
harvest hepdata #594
Comments
blocked by #621 |
Create a DAG to harvest Hepdata in airflow |
Metadata mapping for steps 2. and 3.
@GraemeWatt could you please check if this make sense from your side? |
note: its not possible to import possible solutions,
|
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
reworked httphooks to make more easy to use * ref: cern-sis/issues-inspire/issues/594
We need to harvest hepdata daily to get new and updated records, convert them to our metadata schema and update our records. In the current infrastructure, this encompasses both actual harvesting (as in
hepcrawl
) and holdingpen logic (as ininspire-next
), but how and whether to split those in this case is a technical decision still to be determined.The logic is as follows:
https://www.hepdata.net/record/{recid}?format=json
, retrieve the metadata (not attached documents) and convert it to our metadata schema. In particular, this requires deriving the main unversioned DOI by removing the.vN
suffix fromrecord.hepdata_doi
(not present explicitly in the metadata).version
is not equal to 1, we need to retrieve all DOIs associated with previous versions by fetchinghttps://www.hepdata.net/record/{recid}?format=json&version={previous_version}
for all 1 <= previous_version < version and add them to the record.The text was updated successfully, but these errors were encountered: