-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retrieving author affiliations for a given DOI? #26
Comments
I guess the XML structure implied that the location was for the
publisher (to me), not the author(s). Usually an affiliation is associated
per-author, not per paper. We (at least in our work) often have
multi-institution papers (Nanomine being a perfect example).
…On Wed, Jul 15, 2020 at 11:08 AM mdeagen ***@***.***> wrote:
Author affiliations for a DOI appear to be connected to the *publisher*,
rather than the DOI itself.
Example SPARQL query:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX prov: <http://www.w3.org/ns/prov#>
PREFIX dct: <http://purl.org/dc/terms/>
SELECT DISTINCT * WHERE {
<http://dx.doi.org/10.1016/j.eurpolymj.2008.06.015> dct:isPartOf [ dct:publisher [ prov:atLocation ?place ]]
}
Desired query result for this DOI is:
Department of Chemistry, Center for Nanotechnology at CYCU and R&D Center
for Membrane Technology, Chung-Yuan Christian University, Chung Li 32023,
Taiwan, ROC
Actual query result is *54 distinct place URIs* within the knowledge
graph that are connected to the same publisher URI, which in this case is
publisher:elsevier.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#26>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEOOPEEC3GRZCDNATG3R3XA6RANCNFSM4O2UBPPQ>
.
--
Jim McCusker
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
[email protected] <[email protected]>
http://tw.rpi.edu
|
You are correct, the affiliations within a DOI should be linked to the respective authors. However, the affiliation(s) for an author should be resolvable to a specific DOI (since author affiliations can change over time). Should we bypass the XML and use an intelligent agent on the KG in this case? The DOI alone should be sufficient to curate the authors+affiliation information using an external DB (like SemanticScholar), or alternatively scraped from the DOI's URL. |
I think we should be grabbing the metadata directly from the DOI linked
data instead of using the XML data. It's actually got real identifiers for
most authors, including orcids when available.
On Wed, Jul 15, 2020 at 1:00 PM mdeagen ***@***.***> wrote:
You are correct, the affiliations within a DOI should be linked to the
respective authors. However, the affiliation(s) for an author should be
resolvable to a specific DOI (since author affiliations can change over
time).
Should we bypass the XML and use an intelligent agent on the KG in this
case? The DOI alone should be sufficient to curate the authors+affiliation
information using an external DB (like SemanticScholar), or alternatively
scraped from the DOI's URL.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEOLL4G4D5QX4AZVEXLR3XOEPANCNFSM4O2UBPPQ>
.
--
Jim McCusker
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
[email protected] <[email protected]>
http://tw.rpi.edu
|
If there is no freely available DOI metadata API that meets our purposes, we may be able to adapt the doi-crawler that Bingyin developed (web-scraper with configurations for several journal web pages). Instead of XML output, it would be configured to generate RDF directly. How to best model the DOI-->AuthorURI-->AffiliationURI relationship? Here is a recommendation from DublinCore's citation guidelines: However, this approach would not resolve individual author affiliations for a multi-author, multi-institution work. What would be the preferred predicate for AuthorURI-->AffiliationURI triples (purple dashed arrows below)? |
It's simpler than that. Content negotiate text/turtle against
http://dx.doi.org/{{doi}} and you'll get all of that.
Jim
…On Wed, Jul 15, 2020 at 4:20 PM mdeagen ***@***.***> wrote:
If there is no freely available DOI metadata API that meets our purposes,
we may be able to adapt the doi-crawler that Bingyin developed
<https://github.com/bingyinh/doi-crawler> (web-scraper with
configurations for several journal web pages). Instead of XML output, it
would be configured to generate RDF directly.
How to best model the DOI-->AuthorURI-->AffiliationURI relationship?
Here is a recommendation from DublinCore's citation guidelines:
[image: image]
<https://user-images.githubusercontent.com/43749866/87589407-304cb300-c6b3-11ea-97e8-4a68fb6cca1f.png>
However, this approach would not resolve individual author affiliations
for a multi-author, multi-institution work. What would be the preferred
predicate for AuthorURI-->AffiliationURI triples (purple dashed arrows
below)?
[image: image]
<https://user-images.githubusercontent.com/43749866/87591721-bb7b7800-c6b6-11ea-9802-d4d6491812cc.png>
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCENXFSOJFJC66SDLZH3R3YFO7ANCNFSM4O2UBPPQ>
.
--
Jim McCusker
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
[email protected] <[email protected]>
http://tw.rpi.edu
|
Thanks for the tip! I wonder if we could import citation information in the KG using this method rather than converting from XML? (Would still need to do some federation of author URIs, but first/last name (ignoring middle initial) could work as a first approximation...) Do you know of a service that provides author institution/affiliation with a similar request? Looks like institutions are not part of CrossRef. For example, the following request: returns this output:
|
Follow-up on the concept map above... would Keeping author URIs from CrossRef could be beneficial as they are unique to the person and time of publication. If we only had global IDs such as ORCID, we would not be able to resolve author affiliation for a given DOI if, for example, the author had later moved to another institution they had collaborated with in an earlier publication. As an example, returning a list of Authors and Affiliations for a given DOI:
Where possible, CrossRef author URIs could be linked to their ORCIDs (using Another example, returning list of DOIs and Affiliations for a given Author based on their ORCID:
|
Following up on this issue, here is an example SPARQL query that shows the problem.
Because PROPOSED SOLUTION: VERIFICATION:
The binding to ?Location should be the string |
If the location is the city of the publisher, wouldn't it be weird to say
that a paper has a location though?
…On Wed, Sep 1, 2021 at 10:40 AM mdeagen ***@***.***> wrote:
Following up on this issue, here is an example SPARQL query that shows the
problem.
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * WHERE {
?doi a dct:BibliographicResource ;
dct:isPartOf [ dct:title ?Journal;
dct:publisher [ rdfs:label ?Publisher;
prov:atLocation [ rdfs:label ?Location ] ] ] .
} VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> }
Because prov:atLocation stems from the node of a publisher URI, we lose
the link between a ?doi and its ?Location (since multiple DOIs and/or
journals can have the same publishing house).
*PROPOSED SOLUTION:*
Move the "prov:atLocation" clause in xml_ingest.setl.ttl
<https://github.com/tetherless-world/nanomine-graph/blob/master/setl/xml_ingest.setl.ttl>
up *two* levels, such that prov:atLocation extends directly from the
dct:BibliographicResource.
[image: image]
<https://user-images.githubusercontent.com/43749866/131689291-94bff541-dbdf-4646-8434-b689330c1abc.png>
*VERIFICATION:*
Use the following SPARQL query to verify:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX prov: <http://www.w3.org/ns/prov#>
SELECT * WHERE {
?doi a dct:BibliographicResource ;
dct:isPartOf [ dct:title ?Journal;
dct:publisher [ rdfs:label ?Publisher ] ] ;
prov:atLocation [ rdfs:label ?Location ] .
} VALUES ?doi { <http://dx.doi.org/10.1016/j.jeurceramsoc.2007.02.082> }
The binding to ?Location should be the string "Microelectronics and
Materials Physics Laboratories, EMPART Research Group of Infotech Oulu,
P.O. Box 4500, FIN-90014 University of Oulu, Finland" to match the
corresponding XML file
<https://materialsmine.org/nmr/xml/L102_S6_Hu_2007?format=xml>.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEJCGUIJAUQSAGAHKF3T7Y3NTANCNFSM4O2UBPPQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Jamie McCusker (she/they)
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
***@***.*** ***@***.***>
http://tw.rpi.edu
|
The location being stored in the XML is not the city of the publisher. The xpath |
Ah, then yes, moving it up makes sense. Things were ambiguous there.
…On Wed, Sep 1, 2021 at 11:37 AM mdeagen ***@***.***> wrote:
The location being stored in the XML is not the city of the publisher. The
xpath //Citation/CommonFields/Location is the affiliated author address
populated into the XML. Theoretically there should be more than one (if so,
we would need a for loop), but the scraper that populates the XML appears
to only grab one, so the proposed fix should suffice for the current state
of the XML representations.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAETCEPYOADMUGPX72SOB2TT7ZCEZANCNFSM4O2UBPPQ>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
--
Jamie McCusker (she/they)
Director, Data Operations
Tetherless World Constellation
Rensselaer Polytechnic Institute
***@***.*** ***@***.***>
http://tw.rpi.edu
|
Author affiliations for a DOI appear to be connected to the publisher, rather than the DOI itself.
Example SPARQL query:
Desired query result for this DOI is:
Actual query result is 54 distinct place URIs within the knowledge graph that are connected to the same publisher URI, which in this case is
publisher:elsevier
.The text was updated successfully, but these errors were encountered: