Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving author affiliations for a given DOI? #26

Open
mdeagen opened this issue Jul 15, 2020 · 11 comments
Open

Retrieving author affiliations for a given DOI? #26

mdeagen opened this issue Jul 15, 2020 · 11 comments

Comments

@mdeagen
Copy link

mdeagen commented Jul 15, 2020

Author affiliations for a DOI appear to be connected to the publisher, rather than the DOI itself.

Example SPARQL query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT * WHERE {
  <http://dx.doi.org/10.1016/j.eurpolymj.2008.06.015> dct:isPartOf [ dct:publisher [ prov:atLocation ?place ]]
}

Desired query result for this DOI is:

Department of Chemistry, Center for Nanotechnology at CYCU and R&D Center for Membrane Technology, Chung-Yuan Christian University, Chung Li 32023, Taiwan, ROC

Actual query result is 54 distinct place URIs within the knowledge graph that are connected to the same publisher URI, which in this case is publisher:elsevier.

@jpmccu
Copy link
Member

jpmccu commented Jul 15, 2020 via email

@mdeagen
Copy link
Author

mdeagen commented Jul 15, 2020

You are correct, the affiliations within a DOI should be linked to the respective authors. However, the affiliation(s) for an author should be resolvable to a specific DOI (since author affiliations can change over time).

Should we bypass the XML and use an intelligent agent on the KG in this case? The DOI alone should be sufficient to curate the authors+affiliation information using an external DB (like SemanticScholar), or alternatively scraped from the DOI's URL.

@jpmccu
Copy link
Member

jpmccu commented Jul 15, 2020 via email

@mdeagen
Copy link
Author

mdeagen commented Jul 15, 2020

If there is no freely available DOI metadata API that meets our purposes, we may be able to adapt the doi-crawler that Bingyin developed (web-scraper with configurations for several journal web pages). Instead of XML output, it would be configured to generate RDF directly.

How to best model the DOI-->AuthorURI-->AffiliationURI relationship?

Here is a recommendation from DublinCore's citation guidelines:
image

However, this approach would not resolve individual author affiliations for a multi-author, multi-institution work. What would be the preferred predicate for AuthorURI-->AffiliationURI triples (purple dashed arrows below)?

image

@jpmccu
Copy link
Member

jpmccu commented Jul 16, 2020 via email

@mdeagen
Copy link
Author

mdeagen commented Jul 16, 2020

Thanks for the tip! I wonder if we could import citation information in the KG using this method rather than converting from XML? (Would still need to do some federation of author URIs, but first/last name (ignoring middle initial) could work as a first approximation...)

Do you know of a service that provides author institution/affiliation with a similar request? Looks like institutions are not part of CrossRef.

For example, the following request:
curl -LH "Accept: text/turtle;q=1.0" http://dx.doi.org/10.1109/TDEI.2014.004415 -o output.txt

returns this output:

<http://id.crossref.org/contributor/linda-s-schadler-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Schadler" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Linda S." ;
      <http://xmlns.com/foaf/0.1/name>
              "Linda S. Schadler" .

<http://id.crossref.org/contributor/brian-benicewicz-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Benicewicz" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Brian" ;
      <http://xmlns.com/foaf/0.1/name>
              "Brian Benicewicz" .

<http://id.crossref.org/issn/1070-9878>
      a       <http://purl.org/ontology/bibo/Journal> ;
      <http://prismstandard.org/namespaces/basic/2.1/issn>
              "1070-9878" ;
      <http://purl.org/dc/terms/title>
              "IEEE Transactions on Dielectrics and Electrical Insulation" ;
      <http://purl.org/ontology/bibo/issn>
              "1070-9878" ;
      <http://www.w3.org/2002/07/owl#sameAs>
              "urn:issn:1070-9878" .

<http://id.crossref.org/contributor/henrik-hillborg-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Hillborg" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Henrik" ;
      <http://xmlns.com/foaf/0.1/name>
              "Henrik Hillborg" .

<http://id.crossref.org/contributor/suvi-virtanen-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Virtanen" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Suvi" ;
      <http://xmlns.com/foaf/0.1/name>
              "Suvi Virtanen" .

<http://id.crossref.org/contributor/su-zhao-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Su Zhao" ;
      <http://xmlns.com/foaf/0.1/name>
              " Su Zhao" .

<http://id.crossref.org/contributor/timothy-m-krentz-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Krentz" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Timothy M." ;
      <http://xmlns.com/foaf/0.1/name>
              "Timothy M. Krentz" .

<http://id.crossref.org/contributor/j-keith-nelson-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Nelson" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "J. Keith" ;
      <http://xmlns.com/foaf/0.1/name>
              "J. Keith Nelson" .

<http://dx.doi.org/10.1109/TDEI.2014.004415>
      <http://prismstandard.org/namespaces/basic/2.1/doi>
              "10.1109/tdei.2014.004415" ;
      <http://prismstandard.org/namespaces/basic/2.1/endingPage>
              "570" ;
      <http://prismstandard.org/namespaces/basic/2.1/startingPage>
              "563" ;
      <http://prismstandard.org/namespaces/basic/2.1/volume>
              "21" ;
      <http://purl.org/dc/terms/creator>
              <http://id.crossref.org/contributor/brian-benicewicz-3u43yacan7302> , <http://id.crossref.org/contributor/linda-s-schadler-3u43yacan7302> , <http://id.crossref.org/contributor/henrik-hillborg-3u43yacan7302> , <http://id.crossref.org/contributor/suvi-virtanen-3u43yacan7302> , <http://id.crossref.org/contributor/j-keith-nelson-3u43yacan7302> , <http://id.crossref.org/contributor/timothy-m-krentz-3u43yacan7302> , <http://id.crossref.org/contributor/su-zhao-3u43yacan7302> , <http://id.crossref.org/contributor/michael-bell-3u43yacan7302> ;
      <http://purl.org/dc/terms/date>
              "2014-04"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> ;
      <http://purl.org/dc/terms/identifier>
              "10.1109/tdei.2014.004415" ;
      <http://purl.org/dc/terms/isPartOf>
              <http://id.crossref.org/issn/1070-9878> ;
      <http://purl.org/dc/terms/publisher>
              "Institute of Electrical and Electronics Engineers (IEEE)" ;
      <http://purl.org/dc/terms/title>
              "Dielectric breakdown strength of epoxy bimodal-polymer-brush-grafted core functionalized silica nanocomposites" ;
      <http://purl.org/ontology/bibo/doi>
              "10.1109/tdei.2014.004415" ;
      <http://purl.org/ontology/bibo/pageEnd>
              "570" ;
      <http://purl.org/ontology/bibo/pageStart>
              "563" ;
      <http://purl.org/ontology/bibo/volume>
              "21" ;
      <http://www.w3.org/2002/07/owl#sameAs>
              <doi:10.1109/tdei.2014.004415> , <info:doi/10.1109/tdei.2014.004415> , <http://dx.doi.org/10.1109/tdei.2014.004415> .

<http://id.crossref.org/contributor/michael-bell-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Bell" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Michael" ;
      <http://xmlns.com/foaf/0.1/name>
              "Michael Bell" .

@mdeagen
Copy link
Author

mdeagen commented Jul 16, 2020

Follow-up on the concept map above... would prov:actedOnBehalfOf suffice for linking author plus reported affiliation?

Keeping author URIs from CrossRef could be beneficial as they are unique to the person and time of publication. If we only had global IDs such as ORCID, we would not be able to resolve author affiliation for a given DOI if, for example, the author had later moved to another institution they had collaborated with in an earlier publication.

As an example, returning a list of Authors and Affiliations for a given DOI:

SELECT * WHERE {
  <doi.org/10.1001/12345> dct:creator ?crossrefAuthURI;
                          dct:contributor ?affiliation .
  ?crossrefAuthURI prov:actedOnBehalfOf ?affiliation .
}

Where possible, CrossRef author URIs could be linked to their ORCIDs (using dct:identifier?). If no ORCID exists, we would revert to the NanoMine author URI.

Another example, returning list of DOIs and Affiliations for a given Author based on their ORCID:

SELECT ?doi ?affiliation WHERE {
  ?crossrefAuthURI dct:identifier <orcid.org/0000-12345> .  
  ?doi dct:creator ?crossrefAuthURI;
       dct:contributor ?affiliation .
  ?crossrefAuthURI prov:actedOnBehalfOf ?affiliation .
}

@mdeagen
Copy link
Author

mdeagen commented Sep 1, 2021

Following up on this issue, here is an example SPARQL query that shows the problem.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * WHERE {
  ?doi a dct:BibliographicResource ;
       dct:isPartOf [ dct:title ?Journal; 
                      dct:publisher [ rdfs:label ?Publisher;
                                      prov:atLocation [ rdfs:label ?Location ] ] ] .
} VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> }

Because prov:atLocation stems from the node of a publisher URI, we lose the link between a ?doi and its ?Location (since multiple DOIs and/or journals can have the same publishing house).

PROPOSED SOLUTION:
Move the "prov:atLocation" clause in xml_ingest.setl.ttl up two levels, such that prov:atLocation extends directly from the dct:BibliographicResource.

image

VERIFICATION:
Use the following SPARQL query to verify:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * WHERE {
  ?doi a dct:BibliographicResource ;
       dct:isPartOf [ dct:title ?Journal; 
                      dct:publisher [ rdfs:label ?Publisher ] ] ;
       prov:atLocation [ rdfs:label ?Location ]  .
} VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> }

The binding to ?Location should be the string "Microelectronics and Materials Physics Laboratories, EMPART Research Group of Infotech Oulu, P.O. Box 4500, FIN-90014 University of Oulu, Finland" to match the corresponding XML file.

@jpmccu
Copy link
Member

jpmccu commented Sep 1, 2021 via email

@mdeagen
Copy link
Author

mdeagen commented Sep 1, 2021

The location being stored in the XML is not the city of the publisher. The xpath //Citation/CommonFields/Location is the affiliated author address populated into the XML. Theoretically there should be more than one (if so, we would need a for loop), but the scraper that populates the XML appears to only grab one, so the proposed fix should suffice for the current state of the XML representations.

@jpmccu
Copy link
Member

jpmccu commented Sep 1, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants