-
Notifications
You must be signed in to change notification settings - Fork 0
Analysis on Reconciling Agent URIs During Discogs to BF Conversion
The goal of this analysis is to better understand the feasibility of reconciling Discogs Agent descriptions to id.loc.gov RWO URIs when converting from Discogs json to BIBFRAME RDF. Are there existing connections between Discogs and id.loc.gov (direct or indirect)? How many? With what frequency? Etc.
This analysis does not take into account any performance concerns during the conversion process (which should be investigated separately); it strictly outlines from a metadata point of view how we might map Discogs Agent identifiers to existing id.loc.gov URIs and/or Wikidata URIs.
There is little evidence we could easily query id.loc.gov data for Discog references to get an id.loc.gov URIs.
- MARC 024s are unevenly populated, they contain few (if any) Discogs identifiers, and even if present in the MARC they are not reflected in the id.loc.gov RDF
- Authority record note fields occasionally reference Discogs, but the information is structured in a way that would not allow a simple one-to-one mapping and would conditional logic and parsing. Perhaps sophisticated methods entity matching could be explored in the future.
prefix owl: http://www.w3.org/2002/07/owl# prefix rdfs: http://www.w3.org/2000/01/rdf-schema#
SELECT ?s ?source ?o WHERE { ?s http://www.loc.gov/mads/rdf/v1#hasSource ?source . ?source http://www.loc.gov/mads/rdf/v1#citation-note ?o . FILTER regex(str(?o), "\{.discog.\}$", "i") } LIMIT 10
A more promising strategy would be to query Wikidata with Discogs identifiers to find Wikidata and id.loc.gov URI equivalents.
As of 2018-12-13 there are 97324 Wikidata entities with Discog identifiers.
Select (COUNT(?item) AS ?totalDiscogIDs) WHERE { ?item wdt:P1953 ?discogsID .
}
As of 2018-12-11 there are 33216 Wikidata entities that have both wdt:P1953 (Discogs artist IDs) and wdt:P244 (Library of Congress Authority IDs) identifiers.
SELECT (COUNT(?item) AS ?hasBothIDs) WHERE { ?item wdt:P1953 ?o1 . ?item wdt:P244 ?o2 . }
This means when converting, we could search agent identifiers in the Discogs json against Wikidata, and possibly find a Wikidata URI and/or an equivalent Library of Congress identifier.
For example http://www.wikidata.org/entity/Q40912 (Frank Sinatra) includes both the Discogs identifier "52833" and the Library of Congress identifier "n50026395". With these equivalencies, the converter could write the id.loc.gov RWO URI in the RDF output using the pattern:
http://id.loc.gov/rwo/agents/[Library of Congress indentifier] e.g. http://id.loc.gov/rwo/agents/n50026395
Using the isolated Discogs json for the Sinatra project (https://github.com/LD4P/ld4p2-cornell/blob/master/Sinatra/Discogs/annotated_sinatra.json), the first 40 unique agent identifiers were queried against Wikidata.
Select Distinct* WHERE { ?item wdt:P1953 ?discogsID . VALUES ?discogsID { "52833" "902493" "93330" "859570" "902491" "1866" "253375" "255801" "299962" "377045" "313097" "1899411" "859122" "931702" "327625" "312531" "265635" "900310" "330706" "1206013" "1855839" "95564" "3854560" "280072" "1206001" "370713" "2527870" "688672" "309989" "636380" "636374" "898406" "651411" "408668" "922250" "710656" "837676" "706105" "803935" "713805"} OPTIONAL { ?item wdt:P244 ?idURI. }
}
The results:
Wikidata URIs found: 13 (32.5%) Wikidata URIs with LCNAF IDs found: 10 (25%)
When we allow duplicate identifiers to remain (which is a better indiction of how often we can find an existing URI), and run the query over the same span of json descriptions (totaling 66 identifiers) we get the following results:
Wikidata URIs found: 39 (59%) Wikidata URIs with LCNAF IDs found: 29 (44%)
Select * WHERE { ?item wdt:P1953 ?discogsID . VALUES ?discogsID { "859570" "52833" "902493" "902491" "93330" "859570" "1866" "52833" "1866" "52833" "299962" "253375" "255801" "377045" "1866" "52833" "299962" "1866" "52833" "1866" "52833" "313097" "335521" "1899411" "859122" "931702" "313097" "327625" "312531" "265635" "900310" "330706" "1206013" "1855839" "95564" "3854560" "280072" "1206001" "313097" "370713" "2527870" "688672" "52833" "309989" "636380" "636374" "52833" "313097" "898406" "651411" "313097" "52833" "1866" "52833" "1866" "7183841" "52833" "299962" "408668" "922250" "710656" "837676" "706105" "803935" "1866" "52833" "693653" "713805"} OPTIONAL { ?item wdt:P244 ?idURI. }
}