You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: paper.qmd
+18-30
Original file line number
Diff line number
Diff line change
@@ -72,12 +72,13 @@ execute:
72
72
cache: true
73
73
---
74
74
75
-
```{r setup}
75
+
```{r setup, cache=FALSE}
76
76
#| include: false
77
77
library("countrycode")
78
78
library("cowplot")
79
79
library("dplyr", warn.conflicts = FALSE)
80
80
library("dm")
81
+
library("english")
81
82
library("giscoR")
82
83
library("ggplot2")
83
84
library("glue")
@@ -92,6 +93,7 @@ library("RPostgres")
92
93
library("sf")
93
94
library("spatstat")
94
95
library("stars")
96
+
library("stringr")
95
97
library("tidyr")
96
98
library("webshot2")
97
99
@@ -124,7 +126,7 @@ As a necessary prerequisite to understanding the context of any past event or pr
124
126
If archaeology is to be an open science [@Lake2012], it is therefore critical that effective open access to chronological information be placed front and centre.
125
127
126
128
Over the last two decades, archaeologists have answered this call by publishing an increasing number of compilations of dates from archaeological contexts as open data.
127
-
These efforts have facilitated major reevaluations of previously-established chronologies [e.g. @HighamEtAl2014; @LoftusEtAl2019; @PratesEtAl2020; @KatsianisEtAl2020], important new insights into past processes [@RirisEtAl2024], and the development of novel ways of using chronological data [@Crema2022].
129
+
These efforts have facilitated re-evaluations of chronologies themselves [e.g. @HighamEtAl2014; @LoftusEtAl2019; @PratesEtAl2020; @KatsianisEtAl2020] but also the development of novel ways of using chronological data [@MoodyEtAl2021; @Crema2022; @RirisEtAl2024].
128
130
<!-- TODO: Would welcome more citations here -->
129
131
The focus has been overwhelmingly on radiocarbon dating and most compilations focus on a single region and/or period.
130
132
The profusion of open radiocarbon data in particular has prompted several initiatives towards a global synthesis [e.g. @SchmidEtAl2019; @BronkRamseyEtAl2019; @BirdEtAl2022].
@@ -147,7 +149,6 @@ Since we envisage both XRONOS as a dataset and XRONOS as software to be continua
147
149
## Compilations of radiocarbon dates {#sec-c14-compilation}
148
150
149
151
Though an explicit emphasis on 'open data' is a relatively recent phenomenon in archaeology [@Lake2012], the open publication of compiled radiocarbon dates has a substantial prehistory.
150
-
<!-- TODO: check and update this about date lists... -->
151
152
Arnold and Libby [-@ArnoldLibby1951] initiated the tradition of regularly publishing all the dates they had obtained.
152
153
This practice was subsequently continued as radiocarbon laboratories periodically shared and compiled their own 'date lists', published mainly in the journals *Radiocarbon* and *Archaeometry*.
153
154
However, as the number of labs and volume of radiocarbon dates being produced grew, this paper-based format became impractical and mostly disappeared [with exceptions, e.g. @NdeyeEtAl2022] without being replaced by another form of systematic data-sharing or dissemination [@BronkRamseyEtAl2019].
@@ -169,7 +170,7 @@ Our review of the literature identified `r n_c14_datasets` published since 1994.
169
170
This is almost certainly an undercount, because our firsthand knowledge of regional literature was limited to Europe and West Asia and many resources only ever existed in 'grey' formats (e.g. websites that were not indexed and no longer exist).
170
171
We also restricted ourselves to structured datasets disseminated primarily in a digital format;
171
172
'date lists' in printed periodicals and gazetteers were excluded.
172
-
A full list of the datasets we identified is presented in appendix XXXX<!-- TODO -->.
173
+
A full list of the datasets we identified can be found in the supplementary materials.
173
174
174
175
```{r fig-c14-datasets-time}
175
176
#| fig-cap: Cumulative number of radiocarbon compilations published since 1995 according to our survey (see Supplementary Material).
@@ -194,7 +195,7 @@ Notable early examples include ANDES 14C in 1994 [Central Andes, @MichczynskiEtA
194
195
From 2010, coinciding with broader shifts in scientific publishing [@TenopirEtAl2011], it became more common to publish standalone 'open data' products in the form of journal supplements, archives in repositories and/or data papers;
195
196
the *[Journal of Open Archaeology Data](<https://openarchaeologydata.metajnl.com>)*, launched in 2012, has been a prominent venue for this latter category.
196
197
Most recently there has been a trend towards providing version-controlled plain text data via platforms such as [GitHub](https://github.com), reflecting the broader adoption of these tools amongst computational archaeologists over the last decade [@BatistRoe2024].
197
-
The shift from online databases towards more static but more preservable open data products is welcome, given how many databases from the first generation have subsequently ceased to be accessible.<!-- TODO: can this be quantified or visualised? -MH -->
198
+
The shift from online databases towards more static but more preservable open data products is welcome, given how many databases from the first generation have subsequently ceased to be accessible.
198
199
Version-controlled repositories are particular well-suited to data compilation projects because they allow for continued updates whilst still providing snapshot 'releases' that are citeable and can be archived in long-term repositories.
199
200
200
201
```{r data-basemap}
@@ -251,10 +252,11 @@ Most radiocarbon datasets we reviewed were compiled with a specific goal in mind
251
252
Laboratory databases solve the problem of currency, but tend to have more arbitary coverage, since the inclusion of data is determined by who submits dates to that lab, not any form of principled curation.
252
253
There are also comparatively few of them – most active labs no longer directly publish dates that they produce (if they ever did).
253
254
254
-
In addition, the temporal and geographic coverage of these resources is uneven [@ChaputGajewski2016; @AlcantaraPedrozainpress], systematically biased [@ClistEtAl2023], and duplicative of each other.
255
-
<!-- TODO: example, X databases for Europe, Y for the rest of the world? -->
256
-
By our count, <!-- TODO: X --> of the databases are not 'open' according to the Open Knowledge Foundation's definition of data openness ["Open data and content can be freely used, modified, and shared by anyone for any purpose", @OpenKnowledgeFoundation], which both limits the access to and reuse potential of these datasets.
257
-
<!-- TODO: X --> are not currently available in readily machine-readable formats (e.g. plain text or database files rather than PDFs or hypertext).
255
+
The temporal and geographic coverage of these resources is uneven [@ChaputGajewski2016; @AlcantaraPedrozainpress], systematically biased [@ClistEtAl2023], and duplicative.
256
+
For example, we identified `r english(sum(str_detect(c14_datasets$m49_region, "Western Europe"), na.rm = TRUE))` different databases covering Western Europe but none covering South Asia.
257
+
The quality and accessibility of published compilations is also variable.
258
+
`r str_to_sentence(english(sum(c14_datasets$open, na.rm = TRUE)))` of the `r english(nrow(c14_datasets))` resources we reviewed are not 'open' according to the Open Knowledge Foundation's definition of data openness ["Open data and content can be freely used, modified, and shared by anyone for any purpose", @OpenKnowledgeFoundation], which both limits the access to and reuse potential of these datasets.
259
+
And even of these, many are not currently available in readily machine-readable formats (e.g. plain text or database files rather than PDFs or hypertext).
258
260
259
261
The fragmentation of the radiocarbon record into regional datasets also hinders analysis at larger scales.
260
262
Although the core elements of a radiocarbon date—laboratory identifier, radiocarbon age, measurement error—are more or less standardised, there is no such consistency in contextual information on the sample or site.
@@ -427,7 +429,8 @@ This is also typically present in many other forms of systematic compilation wor
427
429
Aggregated typological information from such sources are often used in aoristic analysis and related methods [@Mischka2004; @Crema2024].
428
430
What is lacking in this presentation of typological dating is metadata on how the determination was made and how exactly it is to be understood.
429
431
Like any archaeological date, a typological date is derived from a physical sample – the object or set of object from which a chronological estimate was derived.
430
-
Typological dates on one class of object may well clash with other classes of object, or for that matter with scientific dates, but without this kind of metadata such inconsistencies are difficult to resolve. <!-- TODO: clarify - MH -->
432
+
Typological dates on one class of object may well clash with other classes of object, or for that matter with scientific dates – does one trust the date on pottery, the date on architecture, or the radiocarbon date?
433
+
Without additional metadata on e.g. who made the typological determination or what the radiocarbon date was obtained on, such inconsistencies are difficult to resolve
431
434
Similarly the absolute date range corresponding to a typological determination (e.g. "Late Neolithic") can be interpreted in multiple ways depending on the region and intentions of the expert making the determination.
432
435
PeriodO [@RabinowitzEtAl2016] is a linked open data infrastructure that includes a shared vocabulary of typological periods and corresponding calendar age estimates, and an important step towards addressing the latter problem.
433
436
However, it remains to be systematically linked to actual compilations of typological dates.
@@ -444,22 +447,7 @@ Our overall aims in developing XRONOS is to bring this model, which RADON has op
444
447
## Design goals
445
448
446
449
XRONOS is our answer to Kintigh's call [@Kintigh2006] for digital infrastructures that don't just provide access to chronological data but enables researchers to "archive, access, integrate, and mine disparate data sets".
447
-
It parallels and draws inspiration from several similar initiatives within and outwith archaeology, such as <!-- TODO examples
It complements several similar open data infrastructures within and outwith archaeology, such as SEAD for environmental archaeology [@Buckland2014], IMPACT for mummified human remains [@NelsonWade2015], Neotoma for palaeoecological data [@WilliamsEtAl2018], IsoArcH for stable isotope data [@PlompEtAl2022], and the 'Big Interdisciplinary Archaeological Database' (BIAD), an ambitious new initiative to combine many of these individual domains, including chronology [@ReiterEtAl2024].
463
451
To improve upon existing global syntheses of radiocarbon dates (see @sec-global-compilations), we aimed to develop a living infrastructure that both continually collected data from diverse sources and presented a seamless single database to the user.
464
452
The Global Biodiversity Information Facility [GBIF, <https://gbif.org>, @CanhosEtAl2004]—which provides a single, consistent interface to many sources of global biodiversity data—has served as an exemplar for us in this regard.
465
453
@@ -526,7 +514,7 @@ xronos_dm_svg <- xronos_dm |>
526
514
dm_add_fk("versions", "item_id", "sites") |>
527
515
dm_add_fk("versions", "item_id", "taxons") |>
528
516
dm_add_fk("versions", "item_id", "typos") |>
529
-
# TODO: Self-references (primarily persuadable models), or too messy?
517
+
# Exclude self-references (primarily supersedable models) for readability, e.g.
At the base of the XRONOS data model (@fig-data-model) are sets of spatiotemporal coordinates or, as we call them, *chrons*.
543
531
In an archaeological context, we conceptualise a chron as an assertion linking human activity with a particular point in space and time.
@@ -638,9 +626,9 @@ This basic REST pattern is augmented by seven 'actions' (following the standard
638
626
The 'show' action represents interaction with a single resource, as described above.
639
627
The 'index' action, which lists resources of a given type (e.g. <https://xronos.ch/c14s> for radiocarbon dates), is worth special mention because it is through this that the filtering logic at the core of XRONOS' two interfaces is implemented.
640
628
By passing a query as HTTP GET parameters to the index action of a resource, the list returned the user is modified to only include records that match that query.
641
-
For example, <https://xronos.ch/sites?site[country_code]=CH> (the part of the URL after the `?` character encodes the SQL WHERE clause `country = 'CH'` as a GET parameter) lists sites in Switzerland<!-- TODO: check this actually works -->.
629
+
For example, <https://xronos.ch/sites?site[country_code]=CH> (the part of the URL after the `?` character encodes the SQL WHERE clause `country = 'CH'` as a GET parameter) lists sites in Switzerland.
642
630
More complex queries can be executed using nested parameters.
643
-
For example, <https://xronos.ch/c14s?c14[sample][material][name]=charcoal> (encoding that the `c14` table should be joined to the `material` table via `sample`, followed by the WHERE clause `material.name = 'charcoal'`) lists radiocarbon dates obtained from charcoal samples<!-- TODO: check that this actually works... -->.
631
+
For example, <https://xronos.ch/c14s?sample[material][name]=charcoal> (encoding that the `c14` table should be joined to the `material` table via `sample`, followed by the WHERE clause `material.name = 'charcoal'`) lists radiocarbon dates obtained from charcoal samples.
644
632
Uniquely, index actions can also respond with the result in a tabular data format (i.e. `.csv`).
645
633
646
634
## Data ingestion and curation {#sec-implementation-data}
0 commit comments