Skip to content

Commit

Permalink
Merge branch 'main' into 476
Browse files Browse the repository at this point in the history
  • Loading branch information
maehr authored Aug 30, 2024
2 parents 736576d + 9b819f9 commit 2b00e4c
Show file tree
Hide file tree
Showing 4 changed files with 95 additions and 0 deletions.
8 changes: 8 additions & 0 deletions submissions/poster/466/_quarto.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
project:
type: manuscript

manuscript:
article: index.qmd

format:
html: default
50 changes: 50 additions & 0 deletions submissions/poster/466/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
submission_id: 466
categories: 'Poster Session'
title: 'Economies of Space: Opening up Historical Finding Aids'
author:
- name: Lucas Burkart
orcid: 0000-0002-9011-5113
email: [email protected]
affiliations:
- University of Basel
- name: Tobias Hodel
orcid: 0000-0002-2071-6407
email: [email protected]
affiliations:
- University of Bern
- name: Benjamin Hitz
orcid: 0000-0002-3208-4881
email: [email protected]
affiliations:
- University of Basel
- name: Aline Vonwiller
orcid: 0009-0001-2098-9237
email: [email protected]
affiliations:
- University of Basel
- name: Ismail Prada Ziegler
orcid: 0000-0003-4229-8688
email: [email protected]
affiliations:
- University of Bern
- name: Jonas Aeby
email: [email protected]
affiliations:
- University of Basel
- name: Katrin Fuchs
email: [email protected]
affiliations:
- University of Basel

date: 08-28-2024
---

In the realm of historical data processing, machine learning has emerged as a game-changer, enabling the analysis of vast archives and complex finding aids on an unprecedented scale. One intriguing case study exemplifying the potential of these techniques is the digitization of the Historical Land Registry of the City of Basel (=Historisches Grundbuch Basel, HGB).
The HGB, compiled around the turn of the 20th century, contains a wealth of historical data meticulously collected on index cards. Each card represents a transaction or entry from source documents, and the structured data reflects the conventions and interests of its creators. This inherent complexity has set the stage for a multifaceted exploration, encompassing text recognition, specifically for handwritten materials, and information extraction, particularly event extraction.

One of the key accomplishments of this endeavor is the successful application of machine learning algorithms to decipher handwritten content, resulting in a remarkably low character error rate of just 4%. This breakthrough paves the way for extracting valuable information, such as named entities (persons, places, organizations), their relationships, and mentioned events, through specialized language models.

When combined with property information, the extracted data offers a unique opportunity to visualize historical events and transactions on Geographical Information Systems. This process allows for analyzing normative and semantic shifts in the real estate market over time, shedding light on historical changes in language and practice.

Ultimately, this project signifies a milestone in historical data analysis. Machine learning techniques have matured so that even extensive datasets and intricate finding aids can be effectively processed. As a result, innovative approaches to large-scale historical data analysis are now within reach, offering new perspectives on dynamic urban economies during pre-modern times. This venture showcases how technological approaches and humanities deliberations go hand in hand to understand complex patterns in economic history.
8 changes: 8 additions & 0 deletions submissions/poster/472/_quarto.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
project:
type: manuscript

manuscript:
article: index.qmd

format:
html: default
29 changes: 29 additions & 0 deletions submissions/poster/472/index.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
---
submission_id: 472
categories: 'Poster Session'
title: Discuss Data -- an Open Repository for Research and Data Communities
author:
- name: Torsten Kahlert
orcid: 0009-0003-3264-5006
email: [email protected]
affiliations:
- Herzog-August-Bibliothek Wolfenbüttel
- name: Daniel Kurzawe
orcid: 0000-0001-5027-7313
email: [email protected]
affiliations:
- SUB Göttingen
date: 08-28-2024
---

In this poster, we show how the Discuss Data research data platform is being expanded to include a "community space" for the digital humanities (DH). Discuss Data enables and promotes contextualized discussion about the quality and sustainability of research data directly on the object.

Current standards and the digitization of existing processes require structures to enable sustainable development models. This applies in particular to the quality of research data, which is becoming increasingly important in the academic debate.

Discuss Data offers a platform for this. In addition to the information technology management, archiving and provision of data, Discuss Data also contextualizes data through curated discussion. The platform addresses individual communities and offers them a subject-specific discussion space and, in the long term, community-specific tools. Communities are not to be equated with disciplines, but are rather interest groups on specific issues or data materials.

Following the introduction of the first community space for the research community on Eastern Europe, South Caucasus and Central Asia in 2020, 121 datasets were published and 141 users have registered (as of 28.11.23). However, the discussion function provided by Discuss Data has been used comparatively little so far. This discussion culture, which is quite common at conferences and reviews and is extremely important from a technical perspective, has not yet become established, despite the positive attitude towards it.

Digital method and source criticism has become one of the central challenges of the digital humanities. Until now, research data has generally been published on institutional repositories or platforms such as Zenodo, but without the kind of quality control that is customary for journal articles. As a result, datasets often remain unused for further processing because it remains unclear what quality the research data has and what it might be suitable for.

From the experience of the first funding phase of Discuss Data, it has become clear that more energy must be put into attracting data curators in order to ensure that the community spaces are supported by the community in the long term. Positive examples are needed for this. For example, the integration of discussions as micropublications could help to demonstrate the individual added value.

0 comments on commit 2b00e4c

Please sign in to comment.