You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: paper_jcaa.qmd
+2-2
Original file line number
Diff line number
Diff line change
@@ -269,7 +269,7 @@ Most radiocarbon datasets we reviewed were compiled with a specific goal in mind
269
269
The fragmentation of the radiocarbon record into regional datasets also hinders analysis at larger scales.
270
270
Although the core elements of a radiocarbon date—laboratory identifier, radiocarbon age, measurement error—are more or less standardised, there is no such consistency in contextual information on the sample or site.
271
271
Such contextual information is important not just for the interpretation of dates, but for 'chronometric hygiene' [filtering out unreliable dates based on sample information, see e.g. @PettittEtAl2003] and for correcting for known systematic errors such as the marine reservoir effect [@AlvesEtAl2018].
272
-
Most published datasets incorporate all or part of earlier compilations, duplicate records are also very common, but deduplicating them is not a trivial problem due to format variations (see @sec-implementation-data).
272
+
Most published datasets incorporate all or part of earlier compilations, meaning duplicate records are also very common, but deduplicating them is not a trivial problem due to format variations (see @sec-implementation-data).
273
273
These issues are by no means impossible to overcome, but adds a significant amount of data-cleaning effort to a process that is otherwise very amenable to standardisation.
274
274
275
275
<!-- TODO: -->
@@ -512,7 +512,7 @@ If the system is not able to standardise a field using the available thesaurus,
512
512
A wide variety of other potential data quality issues (e.g. missing data on what country a site is in) are also flagged for human review by this system [@tbl-issues], which can often be semi-automated (e.g. suggesting close matches in the thesaurus or the country indicated by the record's coordinates).
513
513
514
514
A final critical component of XRONOS' data curation system is duplicate handling.
515
-
We import data from many overlapping resources (many of which incorporate each other either in whole or in part), so duplicate records are common.
515
+
We import data from many overlapping resources (many of which incorporate each other either in whole or in part), so duplicate records are common[as recently discussed by @ReiterEtAl2024].
516
516
The end result of standardising and correcting a record is also often to create a duplicate: e.g. the same sample imported from one source as 'oak' but another as '*Quercus* sp.' will become a duplicate pair as '*Quercus*', and thus be recognised as a single sample.
517
517
Such exact duplicates can be merged automatically, with the oldest record becoming the authoratitive version, but detecting fuzzier duplicated information (e.g. differences in the spelling of site names) has proved a more difficult problem.
518
518
As of writing there are therefore still many duplicate records in XRONOS that need to be manually resolved, but we hope to automate much more of this work in the future.
Copy file name to clipboardexpand all lines: references.bib
+11
Original file line number
Diff line number
Diff line change
@@ -810,3 +810,14 @@ @article{Crema2024
810
810
langid = {american}
811
811
}
812
812
813
+
@article{ReiterEtAl2024,
814
+
url = {https://doi.org/10.1515/opar-2024-0015},
815
+
title = {The BIAD Standards: Recommendations for Archaeological Data Publication and Insights From the Big Interdisciplinary Archaeological Database},
816
+
author = {Samantha S. Reiter and Robert Staniuk and Jan Kolář and Jelena Bulatović and Helene Agerskov Rose and Natalia E. Ryabogina and Claudia Speciale and Nicoline Schjerven and Bettina Schulz Paulsson and Victor Yan Kin Lee and Elisabetta Canteri and Alice Revill and Fredrik Dahlberg and Serena Sabatini and Karin M. Frei and Fernando Racimo and Maria Ivanova-Bieg and Wolfgang Traylor and Emily J. Kate and Eve Derenne and Lea Frank and Jessie Woodbridge and Ralph Fyfe and Stephen Shennan and Kristian Kristiansen and Mark G. Thomas and Adrian Timpson},
0 commit comments