Evaluate Adding a Match Key to Each Record in Catalog #250

kevinreiss · 2024-11-14T14:16:23Z

Introduction

There are numerous duplicate records in the catalog that result in noisy search results that can be difficult for users to browse through. As resource sharing efforts advance (such as POD/ReShare) and more records may be made available to users in our discovery system this problem will continue to grow.

Problem Statement

Many resource sharing and collection evaluation efforts utilize the concept of a match key to identify duplicate records. Gold rush is a well-known example. Can we identify an open and reliable formula that we can leverage to add a match key to each bibliographic record in our discovery system? We also want to be able to share the algorithm we select with our peer institutions, library staff members and end users who want to know how the system handles this question.

Initial Goals

Identify a test set of records containing known duplicates and near matches.
Select a test algorithm and run it on a set of test records.
Develop a plan for scaling this up to our entire set of bibliographic records we make available in Orangelight.
Develop a plan that leverages the key to de-duplicate records for end user discovery that handles the following questions (1) How do we handle letting the user know about holdings of de-duplicated records? (2) How do we handle electronic only materials that may be logical duplicates of print only materials?

Acceptance criteria

Update an existing markdown document or add a new one in the research directory.
It has introduction: Explains the goals and purpose of this research work.
It lists methods: Describe what you did to research the question.
It has a conclusion: Includes a summary of what was discovered in the research process.
It has a step by step list of potential next steps that build upon the research.
It includes references: References any related resources that have assisted in the research process (links to other tickets, online articles etc.).
It includes any artifacts (charts, notes, code samples etc.) that were produced during the work.

maxkadel · 2024-11-19T20:01:12Z

FYI - cataloging guidelines for identifying duplicates (the purpose of these guidelines is identifying them for weeding)

maxkadel · 2024-11-21T14:33:35Z

Mark's implementation - https://github.com/PrincetonUniversityLibrary/lib_reports/blob/main/ruby/lib/ils_sql/goldrush_algorithm.rb
Link to document on algorithm - https://coalliance.org/sites/default/files/GoldRush-Match_KeyJanuary2024_0.doc

kevinreiss added the Research label Nov 14, 2024

maxkadel assigned kevinreiss Nov 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate Adding a Match Key to Each Record in Catalog #250

Evaluate Adding a Match Key to Each Record in Catalog #250

kevinreiss commented Nov 14, 2024 •

edited

Loading

maxkadel commented Nov 19, 2024

maxkadel commented Nov 21, 2024

Evaluate Adding a Match Key to Each Record in Catalog #250

Evaluate Adding a Match Key to Each Record in Catalog #250

Comments

kevinreiss commented Nov 14, 2024 • edited Loading

Introduction

Problem Statement

Initial Goals

Acceptance criteria

maxkadel commented Nov 19, 2024

maxkadel commented Nov 21, 2024

kevinreiss commented Nov 14, 2024 •

edited

Loading