You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
assiging proper identifier (instead of currently used file name which is not unique)
extract text out of the WileyML record
Currently an input is DocumentText datastore with file name set as id and WileyML record as text.
We could start with id extraction first and propagate full XML record as text in the begining. Instead of relying on a file name, which currently identifies Wiley XMLs in the DocumentText avro datastore we should identify those XML records with something better like DOI. DOIs are available in the XMLs so it should be possible to extract them. There are multiple DOIs defined for a single WileyML record (e.g. identifying Journal or issue apart from identifying article) so we should pick carefully the right DOI and pick a replacement whenever article DOI is not available.
The text was updated successfully, but these errors were encountered:
…Text datastore
Introducing the first version of the importer module along with the workflow.xml definition.
Current version of the importer expects DocumentText avro records at input (with text field providing WileyML records) and produces DocumentText records at output with the identifier field updated with a DOI extracted from the WileyML record.
…Text datastore
Introducing the first version of the importer module along with the workflow.xml definition.
Current version of the importer expects DocumentText avro records at input (with text field providing WileyML records) and produces DocumentText records at output with the identifier field updated with a DOI extracted from the WileyML record.
Originally requested in: https://support.openaire.eu/issues/8896#note-98
This parser should be responsible for:
Currently an input is
DocumentText
datastore with file name set asid
and WileyML record astext
.We could start with id extraction first and propagate full XML record as text in the begining. Instead of relying on a file name, which currently identifies Wiley XMLs in the
DocumentText
avro datastore we should identify those XML records with something better like DOI. DOIs are available in the XMLs so it should be possible to extract them. There are multiple DOIs defined for a single WileyML record (e.g. identifying Journal or issue apart from identifying article) so we should pick carefully the right DOI and pick a replacement whenever article DOI is not available.The text was updated successfully, but these errors were encountered: