Retrieving author affiliations for a given DOI? #26

mdeagen · 2020-07-15T14:58:44Z

Author affiliations for a DOI appear to be connected to the publisher, rather than the DOI itself.

Example SPARQL query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT * WHERE {
  <http://dx.doi.org/10.1016/j.eurpolymj.2008.06.015> dct:isPartOf [ dct:publisher [ prov:atLocation ?place ]]
}

Desired query result for this DOI is:

Department of Chemistry, Center for Nanotechnology at CYCU and R&D Center for Membrane Technology, Chung-Yuan Christian University, Chung Li 32023, Taiwan, ROC

Actual query result is 54 distinct place URIs within the knowledge graph that are connected to the same publisher URI, which in this case is publisher:elsevier.

The text was updated successfully, but these errors were encountered:

jpmccu · 2020-07-15T15:44:17Z

I guess the XML structure implied that the location was for the publisher (to me), not the author(s). Usually an affiliation is associated per-author, not per paper. We (at least in our work) often have multi-institution papers (Nanomine being a perfect example).

…

On Wed, Jul 15, 2020 at 11:08 AM mdeagen ***@***.***> wrote: Author affiliations for a DOI appear to be connected to the *publisher*, rather than the DOI itself. Example SPARQL query: PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX prov: <http://www.w3.org/ns/prov#> PREFIX dct: <http://purl.org/dc/terms/> SELECT DISTINCT * WHERE { <http://dx.doi.org/10.1016/j.eurpolymj.2008.06.015> dct:isPartOf [ dct:publisher [ prov:atLocation ?place ]] } Desired query result for this DOI is: Department of Chemistry, Center for Nanotechnology at CYCU and R&D Center for Membrane Technology, Chung-Yuan Christian University, Chung Li 32023, Taiwan, ROC Actual query result is *54 distinct place URIs* within the knowledge graph that are connected to the same publisher URI, which in this case is publisher:elsevier. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#26>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETCEOOPEEC3GRZCDNATG3R3XA6RANCNFSM4O2UBPPQ> .

-- Jim McCusker Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute [email protected] <[email protected]> http://tw.rpi.edu

mdeagen · 2020-07-15T17:00:35Z

You are correct, the affiliations within a DOI should be linked to the respective authors. However, the affiliation(s) for an author should be resolvable to a specific DOI (since author affiliations can change over time).

Should we bypass the XML and use an intelligent agent on the KG in this case? The DOI alone should be sufficient to curate the authors+affiliation information using an external DB (like SemanticScholar), or alternatively scraped from the DOI's URL.

jpmccu · 2020-07-15T17:54:29Z

I think we should be grabbing the metadata directly from the DOI linked data instead of using the XML data. It's actually got real identifiers for most authors, including orcids when available.

On Wed, Jul 15, 2020 at 1:00 PM mdeagen ***@***.***> wrote: You are correct, the affiliations within a DOI should be linked to the respective authors. However, the affiliation(s) for an author should be resolvable to a specific DOI (since author affiliations can change over time). Should we bypass the XML and use an intelligent agent on the KG in this case? The DOI alone should be sufficient to curate the authors+affiliation information using an external DB (like SemanticScholar), or alternatively scraped from the DOI's URL. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETCEOLL4G4D5QX4AZVEXLR3XOEPANCNFSM4O2UBPPQ> .

-- Jim McCusker Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute [email protected] <[email protected]> http://tw.rpi.edu

mdeagen · 2020-07-15T20:19:45Z

If there is no freely available DOI metadata API that meets our purposes, we may be able to adapt the doi-crawler that Bingyin developed (web-scraper with configurations for several journal web pages). Instead of XML output, it would be configured to generate RDF directly.

How to best model the DOI-->AuthorURI-->AffiliationURI relationship?

Here is a recommendation from DublinCore's citation guidelines:

However, this approach would not resolve individual author affiliations for a multi-author, multi-institution work. What would be the preferred predicate for AuthorURI-->AffiliationURI triples (purple dashed arrows below)?

jpmccu · 2020-07-16T00:28:37Z

It's simpler than that. Content negotiate text/turtle against http://dx.doi.org/{{doi}} and you'll get all of that. Jim

…

On Wed, Jul 15, 2020 at 4:20 PM mdeagen ***@***.***> wrote: If there is no freely available DOI metadata API that meets our purposes, we may be able to adapt the doi-crawler that Bingyin developed <https://github.com/bingyinh/doi-crawler> (web-scraper with configurations for several journal web pages). Instead of XML output, it would be configured to generate RDF directly. How to best model the DOI-->AuthorURI-->AffiliationURI relationship? Here is a recommendation from DublinCore's citation guidelines: [image: image] <https://user-images.githubusercontent.com/43749866/87589407-304cb300-c6b3-11ea-97e8-4a68fb6cca1f.png> However, this approach would not resolve individual author affiliations for a multi-author, multi-institution work. What would be the preferred predicate for AuthorURI-->AffiliationURI triples (purple dashed arrows below)? [image: image] <https://user-images.githubusercontent.com/43749866/87591721-bb7b7800-c6b6-11ea-9802-d4d6491812cc.png> — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETCENXFSOJFJC66SDLZH3R3YFO7ANCNFSM4O2UBPPQ> .

-- Jim McCusker Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute [email protected] <[email protected]> http://tw.rpi.edu

mdeagen · 2020-07-16T13:55:37Z

Thanks for the tip! I wonder if we could import citation information in the KG using this method rather than converting from XML? (Would still need to do some federation of author URIs, but first/last name (ignoring middle initial) could work as a first approximation...)

Do you know of a service that provides author institution/affiliation with a similar request? Looks like institutions are not part of CrossRef.

For example, the following request:
curl -LH "Accept: text/turtle;q=1.0" http://dx.doi.org/10.1109/TDEI.2014.004415 -o output.txt

returns this output:

<http://id.crossref.org/contributor/linda-s-schadler-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Schadler" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Linda S." ;
      <http://xmlns.com/foaf/0.1/name>
              "Linda S. Schadler" .

<http://id.crossref.org/contributor/brian-benicewicz-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Benicewicz" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Brian" ;
      <http://xmlns.com/foaf/0.1/name>
              "Brian Benicewicz" .

<http://id.crossref.org/issn/1070-9878>
      a       <http://purl.org/ontology/bibo/Journal> ;
      <http://prismstandard.org/namespaces/basic/2.1/issn>
              "1070-9878" ;
      <http://purl.org/dc/terms/title>
              "IEEE Transactions on Dielectrics and Electrical Insulation" ;
      <http://purl.org/ontology/bibo/issn>
              "1070-9878" ;
      <http://www.w3.org/2002/07/owl#sameAs>
              "urn:issn:1070-9878" .

<http://id.crossref.org/contributor/henrik-hillborg-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Hillborg" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Henrik" ;
      <http://xmlns.com/foaf/0.1/name>
              "Henrik Hillborg" .

<http://id.crossref.org/contributor/suvi-virtanen-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Virtanen" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Suvi" ;
      <http://xmlns.com/foaf/0.1/name>
              "Suvi Virtanen" .

<http://id.crossref.org/contributor/su-zhao-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Su Zhao" ;
      <http://xmlns.com/foaf/0.1/name>
              " Su Zhao" .

<http://id.crossref.org/contributor/timothy-m-krentz-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Krentz" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Timothy M." ;
      <http://xmlns.com/foaf/0.1/name>
              "Timothy M. Krentz" .

<http://id.crossref.org/contributor/j-keith-nelson-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Nelson" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "J. Keith" ;
      <http://xmlns.com/foaf/0.1/name>
              "J. Keith Nelson" .

<http://dx.doi.org/10.1109/TDEI.2014.004415>
      <http://prismstandard.org/namespaces/basic/2.1/doi>
              "10.1109/tdei.2014.004415" ;
      <http://prismstandard.org/namespaces/basic/2.1/endingPage>
              "570" ;
      <http://prismstandard.org/namespaces/basic/2.1/startingPage>
              "563" ;
      <http://prismstandard.org/namespaces/basic/2.1/volume>
              "21" ;
      <http://purl.org/dc/terms/creator>
              <http://id.crossref.org/contributor/brian-benicewicz-3u43yacan7302> , <http://id.crossref.org/contributor/linda-s-schadler-3u43yacan7302> , <http://id.crossref.org/contributor/henrik-hillborg-3u43yacan7302> , <http://id.crossref.org/contributor/suvi-virtanen-3u43yacan7302> , <http://id.crossref.org/contributor/j-keith-nelson-3u43yacan7302> , <http://id.crossref.org/contributor/timothy-m-krentz-3u43yacan7302> , <http://id.crossref.org/contributor/su-zhao-3u43yacan7302> , <http://id.crossref.org/contributor/michael-bell-3u43yacan7302> ;
      <http://purl.org/dc/terms/date>
              "2014-04"^^<http://www.w3.org/2001/XMLSchema#gYearMonth> ;
      <http://purl.org/dc/terms/identifier>
              "10.1109/tdei.2014.004415" ;
      <http://purl.org/dc/terms/isPartOf>
              <http://id.crossref.org/issn/1070-9878> ;
      <http://purl.org/dc/terms/publisher>
              "Institute of Electrical and Electronics Engineers (IEEE)" ;
      <http://purl.org/dc/terms/title>
              "Dielectric breakdown strength of epoxy bimodal-polymer-brush-grafted core functionalized silica nanocomposites" ;
      <http://purl.org/ontology/bibo/doi>
              "10.1109/tdei.2014.004415" ;
      <http://purl.org/ontology/bibo/pageEnd>
              "570" ;
      <http://purl.org/ontology/bibo/pageStart>
              "563" ;
      <http://purl.org/ontology/bibo/volume>
              "21" ;
      <http://www.w3.org/2002/07/owl#sameAs>
              <doi:10.1109/tdei.2014.004415> , <info:doi/10.1109/tdei.2014.004415> , <http://dx.doi.org/10.1109/tdei.2014.004415> .

<http://id.crossref.org/contributor/michael-bell-3u43yacan7302>
      a       <http://xmlns.com/foaf/0.1/Person> ;
      <http://xmlns.com/foaf/0.1/familyName>
              "Bell" ;
      <http://xmlns.com/foaf/0.1/givenName>
              "Michael" ;
      <http://xmlns.com/foaf/0.1/name>
              "Michael Bell" .

mdeagen · 2020-07-16T15:25:31Z

Follow-up on the concept map above... would prov:actedOnBehalfOf suffice for linking author plus reported affiliation?

Keeping author URIs from CrossRef could be beneficial as they are unique to the person and time of publication. If we only had global IDs such as ORCID, we would not be able to resolve author affiliation for a given DOI if, for example, the author had later moved to another institution they had collaborated with in an earlier publication.

As an example, returning a list of Authors and Affiliations for a given DOI:

SELECT * WHERE {
  <doi.org/10.1001/12345> dct:creator ?crossrefAuthURI;
                          dct:contributor ?affiliation .
  ?crossrefAuthURI prov:actedOnBehalfOf ?affiliation .
}

Where possible, CrossRef author URIs could be linked to their ORCIDs (using dct:identifier?). If no ORCID exists, we would revert to the NanoMine author URI.

Another example, returning list of DOIs and Affiliations for a given Author based on their ORCID:

SELECT ?doi ?affiliation WHERE {
  ?crossrefAuthURI dct:identifier <orcid.org/0000-12345> .  
  ?doi dct:creator ?crossrefAuthURI;
       dct:contributor ?affiliation .
  ?crossrefAuthURI prov:actedOnBehalfOf ?affiliation .
}

mdeagen · 2021-09-01T14:40:14Z

Following up on this issue, here is an example SPARQL query that shows the problem.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * WHERE {
  ?doi a dct:BibliographicResource ;
       dct:isPartOf [ dct:title ?Journal; 
                      dct:publisher [ rdfs:label ?Publisher;
                                      prov:atLocation [ rdfs:label ?Location ] ] ] .
} VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> }

Because prov:atLocation stems from the node of a publisher URI, we lose the link between a ?doi and its ?Location (since multiple DOIs and/or journals can have the same publishing house).

PROPOSED SOLUTION:
Move the "prov:atLocation" clause in xml_ingest.setl.ttl up two levels, such that prov:atLocation extends directly from the dct:BibliographicResource.

VERIFICATION:
Use the following SPARQL query to verify:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * WHERE {
  ?doi a dct:BibliographicResource ;
       dct:isPartOf [ dct:title ?Journal; 
                      dct:publisher [ rdfs:label ?Publisher ] ] ;
       prov:atLocation [ rdfs:label ?Location ]  .
} VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> }

The binding to ?Location should be the string "Microelectronics and Materials Physics Laboratories, EMPART Research Group of Infotech Oulu, P.O. Box 4500, FIN-90014 University of Oulu, Finland" to match the corresponding XML file.

jpmccu · 2021-09-01T15:18:20Z

If the location is the city of the publisher, wouldn't it be weird to say that a paper has a location though?

…

On Wed, Sep 1, 2021 at 10:40 AM mdeagen ***@***.***> wrote: Following up on this issue, here is an example SPARQL query that shows the problem. PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dct: <http://purl.org/dc/terms/> PREFIX prov: <http://www.w3.org/ns/prov#> SELECT * WHERE { ?doi a dct:BibliographicResource ; dct:isPartOf [ dct:title ?Journal; dct:publisher [ rdfs:label ?Publisher; prov:atLocation [ rdfs:label ?Location ] ] ] . } VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> } Because prov:atLocation stems from the node of a publisher URI, we lose the link between a ?doi and its ?Location (since multiple DOIs and/or journals can have the same publishing house). *PROPOSED SOLUTION:* Move the "prov:atLocation" clause in xml_ingest.setl.ttl <https://github.com/tetherless-world/nanomine-graph/blob/master/setl/xml_ingest.setl.ttl> up *two* levels, such that prov:atLocation extends directly from the dct:BibliographicResource. [image: image] <https://user-images.githubusercontent.com/43749866/131689291-94bff541-dbdf-4646-8434-b689330c1abc.png> *VERIFICATION:* Use the following SPARQL query to verify: PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dct: <http://purl.org/dc/terms/> PREFIX prov: <http://www.w3.org/ns/prov#> SELECT * WHERE { ?doi a dct:BibliographicResource ; dct:isPartOf [ dct:title ?Journal; dct:publisher [ rdfs:label ?Publisher ] ] ; prov:atLocation [ rdfs:label ?Location ] . } VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> } The binding to ?Location should be the string "Microelectronics and Materials Physics Laboratories, EMPART Research Group of Infotech Oulu, P.O. Box 4500, FIN-90014 University of Oulu, Finland" to match the corresponding XML file <https://materialsmine.org/nmr/xml/L102_S6_Hu_2007?format=xml>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETCEJCGUIJAUQSAGAHKF3T7Y3NTANCNFSM4O2UBPPQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Jamie McCusker (she/they) Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute ***@***.*** ***@***.***> http://tw.rpi.edu

mdeagen · 2021-09-01T15:37:38Z

The location being stored in the XML is not the city of the publisher. The xpath //Citation/CommonFields/Location is the affiliated author address populated into the XML. Theoretically there should be more than one (if so, we would need a for loop), but the scraper that populates the XML appears to only grab one, so the proposed fix should suffice for the current state of the XML representations.

jpmccu · 2021-09-01T16:15:05Z

Ah, then yes, moving it up makes sense. Things were ambiguous there.

…

On Wed, Sep 1, 2021 at 11:37 AM mdeagen ***@***.***> wrote: The location being stored in the XML is not the city of the publisher. The xpath //Citation/CommonFields/Location is the affiliated author address populated into the XML. Theoretically there should be more than one (if so, we would need a for loop), but the scraper that populates the XML appears to only grab one, so the proposed fix should suffice for the current state of the XML representations. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAETCEPYOADMUGPX72SOB2TT7ZCEZANCNFSM4O2UBPPQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

-- Jamie McCusker (she/they) Director, Data Operations Tetherless World Constellation Rensselaer Polytechnic Institute ***@***.*** ***@***.***> http://tw.rpi.edu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieving author affiliations for a given DOI? #26

Retrieving author affiliations for a given DOI? #26

mdeagen commented Jul 15, 2020

jpmccu commented Jul 15, 2020 via email

mdeagen commented Jul 15, 2020

jpmccu commented Jul 15, 2020 via email

mdeagen commented Jul 15, 2020

jpmccu commented Jul 16, 2020 via email

mdeagen commented Jul 16, 2020

mdeagen commented Jul 16, 2020 •

edited

Loading

mdeagen commented Sep 1, 2021

jpmccu commented Sep 1, 2021 via email

mdeagen commented Sep 1, 2021

jpmccu commented Sep 1, 2021 via email

Retrieving author affiliations for a given DOI? #26

Retrieving author affiliations for a given DOI? #26

Comments

mdeagen commented Jul 15, 2020

jpmccu commented Jul 15, 2020 via email

mdeagen commented Jul 15, 2020

jpmccu commented Jul 15, 2020 via email

mdeagen commented Jul 15, 2020

jpmccu commented Jul 16, 2020 via email

mdeagen commented Jul 16, 2020

mdeagen commented Jul 16, 2020 • edited Loading

mdeagen commented Sep 1, 2021

jpmccu commented Sep 1, 2021 via email

mdeagen commented Sep 1, 2021

jpmccu commented Sep 1, 2021 via email

mdeagen commented Jul 16, 2020 •

edited

Loading