Use cases for institutional Wikibase instances

May 2020

Developed informally by library staff at Columbia University, Harvard University, New York University, and the University of Pennsylvania

Primary contributors: Charlene Chou, Christine Fernsebner Eslao, Jim Hahn, Esther Jackson, Brian Luna Lucero, Timothy Ryan Mendenhall, Honor Moody, Alex Provo, Melanie Wacker, Alex Whelan. Additional input from: Nicky Agate, Diane Shaw, Marc McGee, and Jackie Shieh.

Note: this version of the use cases is closed. Please fork or download and contribute additional use cases!

SUMMARY

The past 15 years have witnessed the development of a number of initiatives, projects, and standards to move library, archives, and museum metadata and collections out of siloed, flat-file database structures and into a linked open data implementation more accessible to open web search. Despite the great deal of interest within the cultural heritage sphere, various roadblocks ranging from vast sets of legacy data, a highly complex and inconsistent metadata ecosystem, steep learning curves, and the lack of adequate systems and tools have prevented many cultural heritage institutions from moving fully into linked data production.

While this thumbnail assessment of linked data within the cultural heritage sphere may sound discouraging, over the past 8 years the Wikimedia community has developed Wikidata, a vast trove of linked data derived in part from Wikipedia, and the underlying Wikibase platform, which offers a viable interface for creating and publishing linked data that even linked data novices can easily learn. Unsurprisingly, many major players in cultural heritage have taken note, ranging from OCLC’s Wikibase explorations [1] to adoption of Wikibase by the Bibliothèque Nationale de France [2] and the Deutsche Nationalbibliothek [3] for their authority datasets. Other cultural organizations like Rhizome, which supports born-digital arts, and projects like Europeana EAGLE [4] are building their databases entirely in Wikibase [5]. In turn, the Wikimedia community is responding by developing a model for federated Wikibase instances, which can serve local needs while easily tapping into the power of the vast Wikimedia metadata ecosystem. Google has taken note--the community-curated linked open data of Wikidata forms the bedrock of the Google Knowledge Graph. [6]

Although the Wikibase platform can easily be installed on a laptop or in a cloud environment like Amazon Web Services [7], many metadata and cultural heritage professionals have found it difficult to get institutional support for such a local instance because IT resources, including the staff time needed to support a local instance of an open source platform, are lacking. Often, institutional structures have made bridging these technological gaps insurmountable. This document seeks to set out a clearly articulated set of use cases demonstrating the benefits of setting up an institutional or consortial Wikibase instance to meet both urgent current needs of academic collections and to help library staff start tackling gaps in data which were formerly unresolvable.

This document formulates a set of use cases for deploying an institutional Wikibase instance that are common to many GLAM institutions (galleries, libraries, archives, and museums), with the goal of helping these institutions secure the resources and institutional buy-in required to set up such a Wikibase instance locally.

BACKGROUND

Authority records are essentially records about a concept or an entity. While each authority record has an identifier assigned to it, the actual matching of a concept or entity in a bibliographic record to its authority record is very much string based. This results in libraries paying huge amounts of money to authority vendors with very mixed results. Identity management on the other hand, relies on URIs to manage and identify an entity instead of a text string formulated according to very precise guidelines. Traditional authority control is still very important in current library systems, but it is not sustainable as the only model going forward. Recognizing this, the Program for Cooperative Cataloging (PCC) states in its Strategic Directions document for 2018-2021, Strategic Direction 4: “Accelerate the movement toward ubiquitous identifier creation and identity management at the network level.” Following through, the PCC Task Group on Identity Management in NACO was charged in 2018 to facilitate this move forward.

As stated above, in traditional library systems like ILSs and OPACs, a large number of essential tasks for metadata creation and maintenance rely on imprecise methods, chiefly matching of text strings. Linked data systems rely on underlying unambiguous identifiers to facilitate both more precise data entry and more dynamic displays--if an underlying authority heading is changed, this change is populated automatically to all records that use that heading--substantially improving the accuracy and reliability of library metadata. In this linked data scenario, authority control and database maintenance processes, whereby headings are controlled to authorized versions of a text string, are largely automated, saving substantial staff time.

With the advent of digital formats and online collections, libraries are now collecting and publishing metadata about a much broader pool of materials than in the past, ranging from institutional repository descriptions of articles and datasets to metadata for digitized archaeological objects. To keep up with the ever-growing streams of digital library collections, metadata practitioners in libraries, archives, and museums no longer work exclusively with vocabularies managed by the Library of Congress and the Program for Cooperative Cataloging, and increasingly use non-library data sources like the Getty Vocabularies, Wikidata, ORCiD, ISNI, Discogs, and a host of others to expedite metadata processes--if a researcher is in the ORCiD database, is it worth the effort to establish them in the Library of Congress-NACO Authority File for the sake of adding a single access point to an article in the institutional repository? In addition, discovery layers now pull in metadata from a wide variety of sources, for example, article metadata from subscription databases, some of which library staff have very little control over. The resulting proliferation of terms describing the same persons, places, and things within a single discovery interface has led to the emergence of identity management as a distinct sector of metadata practice.

Almost all GLAM institutions also maintain some kinds of local controlled vocabularies. Wikibase provides an easy platform for maintaining and accessing vocabularies, whether these vocabularies have interest for audiences outside the institution, for example NYU’s performance art and indigenous film thesauri, or whether these controlled vocabularies are strictly of internal interest, as in ILS codes for location, item type, item status, or names of specific rooms and campus locations.

In this increasing complex metadata ecosystem, projects like VIAF, ISNI, and Wikidata have emerged as identity hubs which bring together identifiers from disparate vocabulary sources. However, barriers to contributing and actively participating in efforts like these prevent even resource-rich libraries from using them meaningfully for identity management within their library collections. Meanwhile, the proliferation of local systems for archival collections management, digital asset management, online exhibitions, scholarly communications / institutional repositories, and others exacerbates the proliferation of terms describing a single entity,[8] as described in the previous paragraph. At a local level, many institutions are experiencing an acute need to improve local capacity for identity management to overcome serious problems with search and retrieval across various platforms, managed both locally and by vendors. Functionality in discovery layers like Primo and Blacklight is not sufficient to address all of these needs, and this failure of existing tools to meet authority control and identity management needs at an institutional level carries high opportunity costs. These opportunity costs affect both researchers, who may have to spend more time than expected to search across various library systems for research materials on a single topic, as well as staff, who may inadvertently create multiple authority records in different systems or who may need to spend additional time doing reference and instruction to assist users with searching across systems. A local instance of Wikibase promises to help overcome many of the problems inherent in the current library systems environment.

Improving authority control and identity management are just the tip of the iceberg. Increasingly, both user expectations and university mandates require libraries to provide multiple ways of accessing collections, for example by allowing for different displays of metadata based on user preferences associated with language, gender, ethnic, and racial identities. As an example, many libraries in Canada and the United States are investigating options to allow for different displays of metadata for collections related to first nation and indigenous groups that are curated by these communities, while still allowing for the powerful access that international, but problematic, vocabularies like LCSH can offer. [9] As amply demonstrated by Wikidata, the Wikibase platform can readily support this scenario. A local Wikibase platform can also support many additional use cases, for instance broader participation in metadata curation, knowledge graph creation, and digital humanities, as well as metadata publication, sharing, and reuse. Extrapolating from efforts within Wikidata and the broader Wikimedia community, Wikibase also holds promise for facilitating meaningful crowdsourcing projects and the application of artificial intelligence and machine learning to our metadata and collections. [10] The following section will focus at length on the various use cases for a local instance of Wikibase. [11]

USE CASES

As a technical note, for the sake of consistency this document will use the terminology item-property-value, which reflects usage within the Wikidata and Wikibase community. This model is for all intents and purposes aligned with the RDF data model of subject-predicate-object, also known as triples. Wikibase can publish data in many formats, including various RDF serializations and JSON. [12] We developed only a very minimal set of personas to guide the development of the use cases, limited to 1) staff users, 2) researchers, and 3) machine consumers. Staff users may include staff engaged in descriptive work like archivists, catalogers, and metadata librarians, public services staff like reference librarians, research-oriented staff like special collections curators, and staff engaged in developing and managing various library systems. The category Researchers includes any non-staff persons using library collections. Finally, the category Machine consumers refers to any program, application, or service that could read, write, add, or delete data from the Wikibase instance, with or without human mediation.

Better control over non-public metadata and locally defined vocabularies for intra-library systems and workflows. In many systems, this information is currently expressed in an unstructured form, and is not related to external vocabularies in a meaningful and actionable way.
- User group: staff user, researcher
- Requirements: Wikibase instance must support different levels of user authentication and metadata publication. Additional requirements may depend on the vocabulary. Internal vocabularies are likely to be subject to stricter control, and may need protection via user authentication. Other vocabularies of wider interest could be open to editing by a larger user base (see Metadata publishing and transparency below).
- Priority: High
- User story: A special collections curator would like to enter structured data for a rights assessment. The curator needs to enter in the life dates of the collection donor for this assessment, because of copyright concerns surrounding the donation. The donor is also the chief subject of the collection, which consists of her personal papers and manuscripts. This life date information could be reused by an archivist or metadata librarian for constructing an access point for the finding aid or digital library collection. If the donor does not wish for her life dates to be known, this information could be suppressed from public view, but could be viewable by staff to confirm that her name is not already established in an existing vocabulary like LC/NAF.
- User story: Metadata and curatorial staff maintain an extensive vocabulary related to performance art to assist with describing a large corpus of locally held collections, one of the largest on earth. Rather than using a spreadsheet or ILS authority management modules, Wikibase is used to store local authorities and headings and prepare them for either maintenance and reuse in library systems (such as the discovery layer), or entrance into a workflow to contribute the terms to external thesauri such as AAT or LCGFT.
Entities that do not meet criteria for notability and identification: Some entities may not be considered notable beyond a local context, such as institutional projects, initiatives, events, event series, and student organizations. In some cases, entities and content of chiefly local interest may become interesting to a broader audience, especially over time. In other cases, it may not even be possible to identify an entity represented in a digital object based on available information, but the entity may still have some characteristics that could be recorded. Recording these characteristics could aid in identifying or disambiguating this entity at a later date. In both of these scenarios--entities of chiefly local interest, and entities that cannot be sufficiently disambiguated--there may not be justification to add descriptions of these entities to external vocabularies like LC/NAF, Wikidata, or ISNI, because of notability and disambiguation requirements. Even vocabularies like ISNI, which does not require a unique text string for the primary label, still require a certain threshold of disambiguation to be met before a description can be added to the ISNI vocabulary.
- User group: staff users, researchers
- Requirements:
  - Some localized properties beyond standardized metadata vocabularies may be needed to represent content with local interest. It may be necessary to support some type of local authentication for certain local entities, analogous to log-in and IP authentication for other types of library resources and collections, although specific use cases may need to be identified. It is possible that these vocabularies could be opened to editing by researchers.
  - Wikibase supports different “ranks” for statements, but the default rank system may need to be adjusted to accommodate GLAM needs, or some custom properties may need to be created. Custom displays or tools may need to be generated to enable staff users to effectively browse through non-unique access points, merge them, and update them. Tools for these purposes exist in the Wikidata ecosystem and could potentially be reused in a local Wikibase instance.
- Priority: High
- User story: Scholarly communications, archives, and metadata staff create descriptions of various events and event series of note on campus, including organized student demonstrations, lecture series, and others, as information on such events may be ephemeral and best captured as events happen, rather than years down the line when materials related to these events may start entering the university collections. Using a local Wikibase instance empowers a broader cohort of stakeholders, including faculty and student content experts, to create descriptions of such content, as they are not hampered by authentication requirements for contributing to national vocabularies like LCSH, or for local digital asset management systems. As events are deemed notable, metadata related to them from the local Wikibase instance can then be easily shared by creating Wikidata items, NAF/LCSH records, and other national and international vocabularies.
- User story: A metadata librarian is processing metadata for a selection of digitized portrait photographs. Although the sitters are identified, many of the names are not unique, and often women are only identified by their husband’s names, such as “Mrs. James Smith.” The metadata librarian creates Wikibase items for each unique sitter with minimal metadata (e.g. date and location associated with the photograph), and forwards the items to a curator with instructions on how they can help enrich the items with greater context to aid researchers in distinguishing between these individuals and others with similar names. Although the curator is not able to enrich every item, some connections are made to existing Wikidata items and LC/NAF records, and for the remaining items, the research work is not lost.
- User story: A metadata librarian is remediating a metadata spreadsheet of images digitized quickly for a researcher with minimal description supplied. A portrait photograph on this list has a caption “Mrs. John Brown” with the device of a studio located in Denver, and the librarian can infer from the image that it was likely shot in the late 19th century. Because the surname of the subject is common, it is not possible to identify whether the name refers to a “John Brown” already established in a vocabulary like Wikidata or LC/NAF, and therefore it is all but impossible to identify even the given name of the sitter. The metadata librarian nonetheless records the caption, location, time period, and implied gender of the sitter in the local Wikibase instance, along with references to the photograph’s parent location and location in the digital library collections. A few years later, selections from a large collection of personal papers from a prominent Denver family is digitized, including a folder of photographs of “Jane Brown.” While preparing these materials for digitization, curators, archivists, preservation librarians, and metadata librarians are able to infer that this Jane Brown is the same as “Mrs. John Brown” digitized a few years ago. The identity of Jane Brown is thus established, and disparate groups of photographs and papers relating to her can be found easily by researchers. In other instances, the metadata librarian may not be able to make a positive identification, but even recording basic research such as “the portrait subject of this photograph is not the same as X, Y, and Z” is very useful information that could benefit staff and researchers in the future.
Granular, curated tracking of organizational name changes, including predecessor/successor relationships, mergers and splits, and relationships to external identifier systems.
- User group: staff users, researchers, machine consumers (discovery layer)
- Requirements: stable modeling for organizations and relationships between organizations; control over editing permissions. While data could be contributed to Wikidata directly, or linked to Wikidata entities, curation of entities and relationships from selected sources and consistent modeling matter.
- Priority: High
- User story: A scholarly communications librarian would like to improve access to materials in the institutional repository produced by the Michael Bloomberg Ecological Studies Institute, which has undergone 5 name changes in 5 years, in part because of a complex series of reorganizations in the School of Arts and Sciences. The bulk of the Institute’s output is white papers, so catalogers in the unit that creates LC/NAF records are unlikely to create the needed access points for the foreseeable future. Years later, the Institute publishes a series of books, and the cataloging staff are able to export metadata from the local Wikibase to semi-automatically create LC/NAF records, without redoing work already completed by scholarly communications staff.
Authority control and identity management across silos: Staff may not be able, because of permissions issues, to search different systems (e.g. Digital Asset Management System, ArchivesSpace, Institutional Repository) to identify existing controlled vocabulary items used within a single institution. Creating and updating multiple descriptions for the same entity in different systems and in different vocabularies is also inefficient. A local instance of Wikibase could help unify and streamline workflows for creating and maintaining entity descriptions across different user groups (librarians, archivists, curators), systems, and vocabularies.
- User group: staff users, machine consumers
- Requirements: The local Wikibase’s metadata application profile would need to accommodate the needs of different user communities and vocabularies, ideally while maintaining interoperability with Wikidata. In conjunction with Wikibase’s SPARQL endpoint, API, and third party tools like OpenRefine, custom export and transformation processes could be developed to enable a Wikibase instance to generate and update headings in different systems.
- Priority: High
- User story: An archivist creates an EAC-CPF-compliant item in ArchivesSpace for Person X represented in an archives collection, and publishes this item to an institutional Wikibase. Metadata and cataloging staff, who are not able to access ArchivesSpace, are able to reuse this Wikibase item when creating access points for a digitized selection of materials from this collection, and for an LC/NAF authority record needed for an access point for a book published about Person X based on the archives collection, without having to manually create three separate authority records. The Wikibase item retains the three different identifiers (ArchivesSpace, local digital collections, LC/NAF) in a single description, simplifying indexing and discovery.
Enable the use of locally preferred labels while pointing to authorized forms of terms: For numerous reasons, an institution may wish to display a locally defined label for an item while maintaining a link to the item’s URI in an external controlled vocabulary. See also a related use case below regarding multi-lingual access.
- User group: staff users, researchers
- Requirements: Some custom properties and policies may need to be developed, but Wikibase supports this functionality out-of-the-box.
- Priority: High
- User story: The manager of the digital asset management system is contacted by the curator of art and architectural collections, who would like to build an online exhibit based on a digitized collection of architectural drawings created by Person X (see previous use case). The curator would prefer to use the Getty Union List of Artist Names form of Person X’s name for the online exhibit. The manager adds the Getty label and URI to the existing Wikibase item for Person X, so that the curator can use the Getty label without losing the connection to other library collections relevant to Person X.
- User story: The name facet on the search results of the digital library collections at Example University displays many headings for departments that start with the phrase “Example University,” such as “Example University. Department of Architecture,” since this is the established form in the LC/NAF. A usability study recommends removing the phrase “Example University” from the name facets to improve the user experience. However, these forms of the names conflict with RDA and PCC instructions for constructing name access points. The local Wikibase instance makes it possible to use the locally required form of the name without losing connection to the LC/NAF and other external vocabularies.
- User story: student and faculty groups are protesting the use of controversial LCSH headings like “Illegal aliens” in the various library search interfaces and catalog records. The metadata librarian updates the preferred label in the local instance of Wikibase to “Undocumented immigrants” as recommended by the student and faculty groups, and runs an update job to populate the change to the library’s discovery layer display. In doing so, the librarian is able to maintain a connection to the LCSH heading while also meeting local needs to create an inclusive environment.
Enable granularity in mapping between vocabularies: most library utilities that support authority control do not implement mapping properties such as skos:closeMatch that enable a more nuanced relationship between entities represented in different vocabularies
- User group: staff users, researchers
- Requirements: Development of an application profile and/or an ontology to define the types of relationships that can be made to an external vocabulary, or basing relationship properties on an external schema like SKOS.
- Priority: Medium
- User story: A researcher is looking for resources representing a certain format/genre. Depending on domain preference, the staff members describing the objects may have used terminology originating from different source vocabularies with slightly different labels and definitions, hence the researcher will need to perform multiple searches to retrieve the desired result set. A system employing mapping properties would eliminate this need.
- User story: a systems librarian, working with a developer, leverages the vocabulary mappings in the local Wikibase to improve indexing and search functionality in the discovery layer.
Cross-platform discovery within a single institution: building on the use case Authority control and identity management across silos, a local instance of Wikibase acting as an entity hub has potential to improve discoverability across disparate local platforms and collections, e.g. ILS, Digital Collections Platform, Institutional Repository, Archives management system, etc.
- User group: Staff users, researchers
- Requirements: Integration with discovery layer
- Priority: High
- User story: After reading an obituary for an affiliated faculty member, a researcher finds several articles by the faculty member in the institutional repository. Curious if the university library has books by this faculty member, or perhaps if there are photographs of her in the digital library collections, the researcher searches on the form of the faculty member’s name from the institutional repository, which is taken from the ORCiD database. Because metadata staff had reconciled the various forms of the faculty member’s name used in different library systems in the local Wikibase instance, and shared this with the systems unit that manages the discovery layer index, the researcher is easily able to find several books by the faculty member, as well as a finding aid for her personal papers, a portrait of her in the university’s art collection, and links to articles by and about her in subscription databases.
Multilingual discovery: with native support for multilingual labels for items and properties, a local Wikibase could support other library systems to allow for multilingual search and discovery of library collections
- User group: staff users, researchers
- Requirements: Multilingual label support is built in to Wikibase
- Priority: Medium
- User story: A curator hears from researchers and students in other countries at a conference that they are having trouble searching our online collections because proficiency in English language is needed to use the interface. The curator contacts the metadata staff with this complaint. By leveraging Wikibase’s support for multilingual items and properties, metadata staff, IT staff, and the curator are able to develop a proof-of-concept multilingual search and discovery interface built off of the local Wikibase instance.
- User story: A librarian in East Asian studies works with a metadata librarian to provide detailed descriptive metadata for non-English audiovisual materials. The East Asian librarian expresses the desire to display titles, names and subjects in the language and script of the content in addition to the romanized titles and names and the English-language subject labels which are typically provided in the digital library collections. Whereas previously the metadata librarian might have only been able to include non-English metadata efforts in unstructured abstract / note fields, the local Wikibase’s support for multilingual labels would support this request to provide access to digital library collections in multiple languages and scripts.
- User story: A partner in a different unit on campus collaborates on a digital library project with librarians and programmers, and wishes to provide trilingual description of both streaming video assets and artists. They are given access to the Wikibase to provide translations.
- User story: A large digital library project of scanned Arabic-language books involves ingest and conversion of MARC from print to e-resource description. While the metadata is published to cooperative library systems, it is also transformed to MODS and used to generate a bilingual website for the project. Some metadata is not always available in Arabic script due to library cataloging rules regarding transliteration, so selected metadata elements are modeled and ingested in Wikibase so that translations can be created and stored.
Provide greater context: Acting as an identity hub aggregating local and external identifiers, a local Wikibase instance could enable seamless integration of contextual information from external sources like Wikipedia into search and discovery pages, and supply data to support filtering and browsing search results based on entity relationships.
- User group: Researchers
- Requirements: development of functionality to link out to external sources, or to generate a knowledge panel / InfoBox within the search and discovery interface
- Priority: Medium
- User story: A student browsing the library’s digital collections for a class assignment comes across a fascinating letter from an activist to a politician. The student is able to use functions built off of the metadata in a local Wikibase instance that make it easy link to contextual information in Wikipedia on the politician, and to find materials in various library collections (print, archives, subscription databases) related to the politician and the activist.
Metadata publishing and transparency: Almost all GLAM institutions maintain some kinds of local controlled vocabularies. Wikibase provides an easy platform for maintaining and accessing vocabularies, whether these vocabularies have interest for audiences outside the institution, for example NYU’s performance art and indigenous film thesauri, or whether these controlled vocabularies are strictly of internal interest, as in ILS codes for location, item type, item status, or names of specific rooms and campus locations. Some local vocabularies might be appropriate for reuse; part of reaching the 5th star in 5 star linked open data is to publish the data openly and using protocols like HTTP and SPARQL. As noted in the ARL White Paper on Wikidata, local wikibases instead of direct contributions to Wikidata might make sense “in cases where data and data models are highly specialized or there are considerations that require greater control over the data” (38). Subset of use cases appropriate for Wikibase:
Publishing of local thesauri and authorities for external use: Providing deferenceable URIs and API/SPARQL access for local vocabulary terms so that others outside of the institution can use these in their own projects (for example, a performance genre thesaurus at NYU). For added context, see the ARL White Paper on Wikidata (2019) page 10, which recommends the use of Wikibase as a “LOD store for local identifiers and authority-like data.”
- User group: staff users, researchers, machine consumers
- Requirements: stable URIs for items, an API and/or a SPARQL endpoint, all of which are built-in to Wikibase
- Priority: High
- User story: Wikibase provides a convenient mechanism for generating stable, reusable URIs for vocabulary items. Publishing this vocabulary via a local Wikibase instance enables these staff users to share the vocabulary while retaining editorial control over it. In turn, publishing the vocabulary enables more widespread adoption outside the local institutional context and provides infrastructure for linkages to related items in Art and Architecture Thesaurus (AAT), Wikidata, LCSH, and other vocabularies, as well as linkages to related materials in other institutional collections.
Publishing of local thesauri and authorities for intra-University use:
- User group: staff users
- Requirements: user authentication, API access or data dumps
- Priority: Medium
- User story: NYU Hemispheric Institute curators would like to maintain control over the content of the artist bios, have the Library care for and maintain the description long-term, access the data for remixing/use on the Institute’s website, and potentially make the description available more widely.
Pipeline to Wikidata and broader web discovery: Using the Wikibase platform should enable easier sharing of metadata with Wikidata, thus engaging with a global platform and greatly increasing the visibility of metadata and collections in open web search and in the open knowledge ecosystem.
- User group: All potential users
- Requirements: Adaption of existing tools like QuickStatements, OpenRefine, PyWikibot, or development of new tools to port data from a local Wikibase into Wikidata. Development of federated Wikidata properties by Wikimedia is a step in the right direction here
- Priority: High
- User story: An agent mentioned in one of the institution's archival collections is established in the local Wikibase instance since that person does not appear to reach the notability level required for inclusion in the public Wikidata instance. In later work it is discovered that this person has published several articles and the metadata is migrated to the public Wikidata platform for others to use.
Digital humanities, database, metadata, and exploratory projects
- Description: Many digital humanists assemble and curate databases as part of their scholarship; these may become standalone digital humanities projects or form the basis of interpretive digital publications (such as narrative projects, websites, visualizations, or some combination thereof). Recent and emerging examples that focus on networks and interconnected entities include Enslaved (enslaved.org), Project Cornelia (https://www.projectcornelia.be/index.html), and Connect Vermeer (http://www.connectvermeer.org/). Overlapping to an extent with digital humanities are a variety of initiatives to use techniques like crowdsourcing and artificial intelligence to assist with metadata creation.
- User group: researchers, staff (metadata, digital scholarship)
- Requirements: Wikibase instances must be easily installed and documented.
- Priority: Low-Medium
- User story: A researcher approaches library staff for help with a new digital humanities project involving collecting and describing documents about a disputed geographic area. The researcher wants to capture granular document attributes specific to the project, and also wants to describe related entities such as places and government officials. The project would benefit from vocabulary control and wiki editing capabilities so that project team members in different countries can collaborate. The researcher also wants to make the data reusable and queryable by others, and they want to build maps and other applications on top of the data. Librarians advise the researcher to use Wikibase, and provide guidance in setting up the instance and in data modeling.
- User story: A large corpus of negatives from a prominent photographer is digitized. Because of the size of the collection, bulk description had been applied using More Product, Less Process principles, but online researchers would greatly benefit from more granular description of the people and events captured in the photographs. Because of rights issues, the photos can’t be shared widely using Wikimedia platforms, but library staff are able to adapt the Wikidata Mix N’ Match tool for their local Wikibase instance to enable researchers to assist with identifying the content of the photographs.
- User story: A large corpus of photographs of New York City landmarks is digitized. As many of these are iconic structures, a curator and faculty member propose running a computer vision project for an upcoming project. Adapting tools deployed for Wikimedia projects to the local Wikibase instance, the professor, curator, and students are able to develop an crowdsourcing interface for human users to confirm matches made by the computer vision project. These confirmed matches can then be exported to update metadata in the digital library collections database.

ALIGNMENT WITH PROFESSIONAL AND INSTITUTIONAL PRIORITIES and STRATEGIC DIRECTIONS

Each institution should fork this repository and fill in this section on their own based on their institution's strategic directions, mission statement, or other guiding principles. The group compiling the use cases did not draft any text on how local Wikibase instances could align with the principles of the broader library community or the library technical services community

RESOURCES NEEDED:

This section could be filled in at a later date, especially after basic proof-of-concept projects are established and evaluated

Hosting / server space
- WbStack is a viable sandbox option for developing a proof-of-concept
- Tutorials by OCLC and Matt Miller (Library of Congress) demonstrate how to get an instance running on a laptop and on AWS
Authentication (in addition to Wikibase's built-in authentication support?)
Developer time, staff time
Training

NOTES AND REFERENCES

See Project Passage and the Shared Entity Management Infrastructure, both of which are built using Wikibase.
French National Entities File (FNE) website https://www.transition-bibliographique.fr/fne/french-national-entities-file/ accessed 8 May 2020.
Wikimedia Deutschland blog post, 9 May 2019. “New testing ground for Wikibase: A federal agency goes on an expedition in the Wiki universe,” accessed 8 May 2020. https://blog.wikimedia.de/2019/05/09/new-testing-ground-for-wikibase-a-federal-agency-goes-on-an-expedition-in-the-wiki-universe/
Europeana EAGLE “aims to build a multi-lingual online collection of millions of digitised items from European museums, libraries, archives and multi-media collections, which deal with inscriptions from the Greek and Roman World.” https://wiki.eagle-network.eu/wiki/Main_Page accessed 8 May 2020
See the Wikibase Registry for more information on Wikibase implementers.
Edward, Tony. “Leveraging Wikidata to Gain a Google Knowledge Graph Result” Search Engine Land, May 1, 2015. https://searchengineland.com/leveraging-wikidata-gain-google-knowledge-graph-result-219706 accessed 8 May 2020
Matt Miller (Library of Congress, Pratt Institute Semantic Lab) has run a series of demonstrations illustrating how to set up a Wikibase instance on Amazon Web Services and other cloud platforms, most recently for the LD4P Wikidata Affinity Group on May 18, 2020. Recordings are available of some of these demonstrations)
For example, consider the many forms for Anthony Fauci. In Library of Congress, he is “Fauci, Anthony, 1940-” [n82011496]; in Wikidata, “Anthony Fauci” [Q573246]; in ORCiD, “Anthony Fauci” [0000-0002-7865-7235]; in ISNI, he has a number of preferred labels, including the English “Anthony Fauci (American immunologist)” [0000000122825159]
See Bone, Christine & Lougheed, Brett. “Library of Congress Subject Headings Related to Indigenous Peoples: Changing LCSH for Use in a Canadian Archival Context.” Cataloging & Classification Quarterly, v.56 no. 1. https://doi-org.ezproxy.cul.columbia.edu/10.1080/01639374.2017.1382641 See also several chapters in Sandberg, Jane (ed.). Ethical Questions in Name Authority Control. Sacramento, CA : Library Juice Press, 2019.
For example, see the Mix N’ Match project within Wikidata, and the use of computer vision on images in Wikimedia Commons to assist with the application of structured metadata statements, as described in Alex Stinson’s presentation at WikiConference North America 2019 and in Andrew Lih’s round table discussion at WikiConference North America.
See also related discussions on pages 8-10 of the ARL White Paper on Wikidata (2019), https://www.arl.org/wp-content/uploads/2019/04/2019.04.18-ARL-white-paper-on-Wikidata.pdf accessed 25 May 2020.
See https://wikiba.se/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UseCases-2020.md

UseCases-2020.md

Use cases for institutional Wikibase instances

May 2020

Developed informally by library staff at Columbia University, Harvard University, New York University, and the University of Pennsylvania

Note: this version of the use cases is closed. Please fork or download and contribute additional use cases!

SUMMARY

BACKGROUND

USE CASES

ALIGNMENT WITH PROFESSIONAL AND INSTITUTIONAL PRIORITIES and STRATEGIC DIRECTIONS

RESOURCES NEEDED:

NOTES AND REFERENCES

Files

UseCases-2020.md

Latest commit

History

UseCases-2020.md

File metadata and controls

Use cases for institutional Wikibase instances

May 2020

Developed informally by library staff at Columbia University, Harvard University, New York University, and the University of Pennsylvania

Note: this version of the use cases is closed. Please fork or download and contribute additional use cases!

SUMMARY

BACKGROUND

USE CASES

ALIGNMENT WITH PROFESSIONAL AND INSTITUTIONAL PRIORITIES and STRATEGIC DIRECTIONS

RESOURCES NEEDED:

NOTES AND REFERENCES