Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Adding a Match Key to Each Record in Catalog #250

Open
11 tasks
kevinreiss opened this issue Nov 14, 2024 · 2 comments
Open
11 tasks

Evaluate Adding a Match Key to Each Record in Catalog #250

kevinreiss opened this issue Nov 14, 2024 · 2 comments
Assignees
Labels

Comments

@kevinreiss
Copy link
Member

kevinreiss commented Nov 14, 2024

Introduction

There are numerous duplicate records in the catalog that result in noisy search results that can be difficult for users to browse through. As resource sharing efforts advance (such as POD/ReShare) and more records may be made available to users in our discovery system this problem will continue to grow.

Problem Statement

Many resource sharing and collection evaluation efforts utilize the concept of a match key to identify duplicate records. Gold rush is a well-known example. Can we identify an open and reliable formula that we can leverage to add a match key to each bibliographic record in our discovery system? We also want to be able to share the algorithm we select with our peer institutions, library staff members and end users who want to know how the system handles this question.

Initial Goals

  • Identify a test set of records containing known duplicates and near matches.
  • Select a test algorithm and run it on a set of test records.
  • Develop a plan for scaling this up to our entire set of bibliographic records we make available in Orangelight.
  • Develop a plan that leverages the key to de-duplicate records for end user discovery that handles the following questions (1) How do we handle letting the user know about holdings of de-duplicated records? (2) How do we handle electronic only materials that may be logical duplicates of print only materials?

Acceptance criteria

  • Update an existing markdown document or add a new one in the research directory.

  • It has introduction: Explains the goals and purpose of this research work.

  • It lists methods: Describe what you did to research the question.

  • It has a conclusion: Includes a summary of what was discovered in the research process.

  • It has a step by step list of potential next steps that build upon the research.

  • It includes references: References any related resources that have assisted in the research process (links to other tickets, online articles etc.).

  • It includes any artifacts (charts, notes, code samples etc.) that were produced during the work.

@maxkadel
Copy link
Contributor

FYI - cataloging guidelines for identifying duplicates (the purpose of these guidelines is identifying them for weeding)

@maxkadel
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants