This repository contains data and code produced in 2022/2023 by Isaac Dunford as part of a Digital Humanities Internship funded by the School of Humanities at the University of Southampton.
It documents a project whose purpose was to investigate and implement different methods for detecting catalogue entries within printed catalogues. For whilst printed catalogues are easy enough to digitise and convert into machine readable data, dividing that data by catalogue entry requires visual signifiers of divisions between entries - gaps in the printed page, large or upper-case headers, catalogue references - into machine-readable information.
Isaac describes the work in his post of the British Library Digital Scholarship blog.
To test this we worked with XML-formatted data derived from the 13-volume Catalogue of books printed in the 15th century now at the British Museum. The project was undertaken in support of Rossitza Atanassova's AHRC-RLUK Professional Practice Fellowship.
This project continues to be maintained at https://github.com/britishlibrary/Incunabula-Catalogue-Entry-Detection.
All data provided by the British Library: text data CC0 1.0 Universal Public Domain; images CC-BY 4.0 International. For code use MIT License.