-
Notifications
You must be signed in to change notification settings - Fork 1
Home
As the National Library of the Netherlands, the KB collects and stores all publications published in and about the Netherlands. These publications are described by manually assigning keywords by KB’s cataloguers. While for centuries, the KB stored only physical objects, since a few years digital publications are also being collected and catalogued. While these digital publications are available as full text, the KB wants to explore the possibilities to automate the labelling of publications. We entered the ICT with Industry Workshop in 2019, where our main research question was: “Can we automatically label scientific texts with relevant key words?”. For this case study we used a set of dissertations which were manually tagged with keywords from the Brinkman thesaurus (‘Brinkeys’), and a set of metadata scraped from university websites, combined with available abstracts and full text. The data is sparse and heterogeneous, and partly in English and partly in Dutch, which complicates any classification attempt as the Brinkman keywords list is completely in Dutch. We investigated a range of approaches, for example testing existing tools, matching (translated) keywords in titles, and applying multilingual semantic embeddings. These initial experiments show promising results and possibilities for future support tools for the KB cataloguers. A list of suggested keywords could then be used by KB employees to quickly select the relevant keywords, which could provide consistency and a significant time saving in the day to day work of the cataloguers. After the ICT with Industry workshop we wanted to continue developing this tool and try to make it usable for our own cataloguers. Besides the keywords part, we also started to investigate if a tool could help us with another field: the author recognition. Here for we combined a different approach includieng machine learning. Together with the catalog department we developed a pilot tool allowing us to show the possibilities with this intern build tool.