Skip to content

Commit

Permalink
Revise conclusions text
Browse files Browse the repository at this point in the history
  • Loading branch information
joeroe committed Feb 7, 2024
1 parent 000de70 commit 1e49503
Show file tree
Hide file tree
Showing 5 changed files with 623 additions and 504 deletions.
13 changes: 6 additions & 7 deletions analysis/_03-data_methodology.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ oarch <- mutate(
)
```

We present an exploratory analysis of open-archaeo [[open-archaeo.info](open-archaeo.info), @batist2023], a directory of `r nrow(oarch)` pieces of open source archaeological software and other digital resources.
We present an exploratory quantitative analysis of open-archaeo [[open-archaeo.info](open-archaeo.info), @batist2023], a directory of `r nrow(oarch)` pieces of open source archaeological software and other digital resources maintained primarily by one of us (ZB) since 2018.

We compiled the dataset by browsing collaborative software development platforms, relying heavily on their social networking features.
More specifically, we update open-archaeo by manually crawling through archaeologists' profiles on these platforms, as well as on other personal, professional, and institutional websites that describe and host additional archaeological software.
Expand All @@ -29,8 +29,7 @@ Open-archaeo is a relatively comprehensive list.
While our initial intention was to only list open source software, its scope has expanded to include all software created by and for archaeologists.
Apart from regular updates by its primary maintainer (ZB), it has been expanded by a wider network of contributors and has benefited from the wider range of domain specialisms this has brought.
However, open-archaeo generally lacks software written before archaeologists started using collaborative software development platforms such as GitHub, and software that is not shared on the web at all.
The dataset is also limited by the experiences of its primary maintainers.
We welcome anyone, especially domain specialists who are familiar with the kinds of tools commonly used in their specific fields, to help fill in these gaps.
The dataset is also limited by the experiences of its primary maintainers.^[We welcome anyone, especially domain specialists who are familiar with the kinds of tools commonly used in their specific fields, to help fill in these gaps. Instructions for contributing to open-archaeo can be found at <https://github.com/zackbatist/open-archaeo>.]

```{r data-github}
# If cached data at `analysis/data/derived_data/oarch.RData` is present, this
Expand Down Expand Up @@ -108,16 +107,16 @@ oarch_forges |>
```

Where applicable, we obtained more detailed information about each repository's contents and contribution histories from the GitHub API (application programming interface).
In total, we analysed data on `r n_repos` repositories, comprising `r n_commits` commits, `r n_issues` issues/pull requests, and `r n_comments` from `r n_contributors` distinct users, as well as repository metadata on programming languages used, licensing, stars and forks, and so on.
Our analysis incorporates data on `r n_repos` repositories, comprising `r n_commits` commits, `r n_issues` issues/pull requests, and `r n_comments` from `r n_contributors` distinct users, as well as repository metadata on programming languages used, licensing, stars and forks, and so on.

We opted to only collect repository data from GitHub because it is the most popular forge platform used by open-archaeo projects (@tbl-forges).
This means that projects that do not use version control (`r percent(p_no_vc)` of the total), or host it elsewhere (`r percent(p_no_github)` of the total), are excluded from these parts of the analysis, though we were still able to perform an analysis of their contents and authorship from the data included in open-archaeo itself.
We also exclude collaboration through offline or private channels, or forms of collaboration we do not know about.
This means that projects that do not use version control (`r percent(p_no_vc)` of the total), or host it elsewhere (`r percent(p_no_github)` of the total), are excluded from these parts of the analysis, though we were still able to perform an analysis of their contents and authorship from the data compiled in open-archaeo itself.
We also cannot include collaboration through offline or private channels, or forms of collaboration we do not know about.
We did not directly observe or interview archaeological software developers, though our conclusions do draw heavily from our experience as members of that community ourselves.
Our earliest data is from 2005 and our study can say little about collaborative software development in archaeology before this point, though we know there was a significant amount of it [@ducke2013; @whallon1972].

These caveats notwithstanding, the open-archaeo directory and the supplemental data from the GitHub API provide a rich resource to explore the nature of collaborative software engineering in archaeology.
Here we employ exploratory data analysis [@tukey1977] to identify and characterise overall trends visible in this rich dataset.
Here we employ exploratory data analysis [*sensu* @tukey1977] to identify and describe overall patterns visible in this rich dataset.
In @sec-open-archaeology, our focus is on examining the general state of open source archaeological software and resource development.
In @sec-collaboration, we refine our analysis to examine development processes, with specific focus on collaborative experiences.
Finally, in @sec-network, we apply network analysis methods to investigate the formation of broader collaborative communities.
Expand Down
4 changes: 2 additions & 2 deletions analysis/_06-network.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -209,8 +209,8 @@ The core cluster is characterized by repositories whose contributors commit to p
Clustering also reveals distinct collaborative networks within the user-user graph.
We again see a complementary primary core connected to several more peripheral clusters, which are internally-cohesive and exhibit few connections with other peripheral clusters.
The central core bridges all the peripheral clusters.
The central core is not uniform, and comprises several relatively discrete sub-clusters representing collaborative sub-communities.
While these sub-clusters are internally cohesive, they exhibit enough connections to other members of the central core so as to not be considered as separate or peripheral clusters.
The central core is not uniform, and comprises several relatively discrete clusters representing collaborative sub-communities.
While these clusters are internally cohesive, they exhibit enough connections to other members of the central core so as to not be considered as separate or peripheral clusters.

In both the repository-repository and user-user networks, the peripheral clusters correspond with either the connections surrounding specific projects or the series of repositories created by single individuals and sometimes also their close colleagues.
On the other hand, the central cores exhibit greater internal variety that may correspond with social connections and the formation of a complex software development community.
Expand Down
54 changes: 20 additions & 34 deletions analysis/_07-conclusion.qmd
Original file line number Diff line number Diff line change
@@ -1,56 +1,42 @@
# Conclusion

<!--
There is an emerging community of practice around open source in archaeology.
* It is limited and fragmented
* A small set of subfields are represented
* Probably most people still haven't heard it?
* Collaboration is limited
* Most work remains solo & short-lived
* Perhaps because open source norms conflict with academic norms?
* The members of this community face structural challenges in academia
-->

Our goal in this study was to investigate the under-explored research practices involved in research software engineering.
Our goal in this study was to investigate the under-explored research practices involved in research software engineering in archaeology.
We sought to identify not only _what_ kinds of software archaeologists are making, but _how_ archaeologists create these tools as part of a broader community of practice.
Our emphasis on the collaborative experiences involved in open source software development emerged from our experience maintaining open-archaeo, through which we observed that making one's code openly available on the web does not necessarily garner the benefits often touted by open science advocates, namely that source code can be audited, forked, and appropriated for alternative use cases, which are effectively social and collaborative experiences.

To investigate these concerns, we operationalised open-source collaborative experiences as the use of certain features of git and GitHub visible to us in data from the GitHub API.
With this data, we documented that open source software development in archaeology has seen a rapid and sustained rise beginning around 2014.
With this data, we documented that open source software development in archaeology has seen a rapid and sustained rise beginning around 2014 (@fig-github-cumulative).
This is marked by a variety of applications and use cases, including the use of git and GitHub to track and host content other than code.
Moreover, archaeologists are very involved in broader scripting ecosystems, as is evident through the predominant creation of R packages and Python libraries designed to process the rich variety of archaeological information.
At the same time, archaeologists also create standalone software for more intensive tasks that require greater access to system resources or that warrant more complex user interfaces than what R and Python IDEs are capable of providing.

Turning to the thematic distribution of open archaeology projects, we note that, among items listed as software, repositories tend to be focused on
various means of identifying distribution patterns (spatial, temporal, statistical),
calibrating data obtained from various instrumental methods (XRF, luminescence dating),
supporting specialized finds analysis (zooarchaeology, palaeobotany, archaeogenetics),
and supporting the collection and processing of archaeological materials.
These tools tends to be focused on various means of identifying distribution patterns (spatial, temporal, statistical), calibrating data obtained from various instrumental methods (XRF, luminescence dating), supporting specialized finds analysis (zooarchaeology, palaeobotany, archaeogenetics), and supporting the collection and processing of archaeological materials.
These foci signify gaps in the archaeological toolbox that archaeologists recognized, and have attempted to fill, on their own terms.
In future studies, it would be interesting to examine how the purposeful expansion of open source tools corresponds with broader methodological trends apparent in publishing patterns or through other expressions of archaeological interest (e.g. social media posts, conference sessions).

While there is an emerging community of practice around open source in archaeology, we observed that collaboration remains limited.
Most work is performed individually and is short-lived.
There is an emerging community of practice around open source research software in archaeology.
All but a handful of the GitHub repositories we analysed have more than commit, showing that archaeologists use it for ongoing work rather than merely to upload finished products.
They relatively frequently make use of the 'star' and 'comment' features to engage with others' repositories (@fig-collab-features) and, via these and other shared contributions, we can trace a collaborative network that includes the majority of archaeologists active on GitHub (see @sec-network).

On the other hand, we found that the forms and intensity of collaboration remains limited.
Most work is performed individually (@fig-contributions) and is short-lived (@fig-lifespan; @fig-lifespan-rate).
The vast majority of repositories have 1--3 contributors, with only a few distinguished by an active and diverse developer base.
Our analysis also shows an uneven use of git and GitHub's extended features, beyond their basic usage as a version control system and repository host.
Generally speaking, we believe that this is because people do not want to step on other people's toes by raising issues or intruding on other people's projects.
While GitHub's more passive collaborative features (stars, comments) are commonly used, those that involve direct engagement with repository content (issues, forks, pull requests) are not (@fig-collab-features);
perhaps because people do not want to 'step on toes' or be seen to be intruding on others' projects.
This may relate to the fact that most developers on this list are academics who hold different values relative to the designers of open source development environments, regarding how collaboration should occur, for example, when dealing with how projects and ideas are 'owned' by individuals or communities, and how work should be iteratively improved upon.

Our network analysis similarly draws attention to the real-world collaborative ties that underpin archaeological open source software development.
Our network analysis (@sec-network) similarly draws attention to the real-world collaborative ties that underpin archaeological open source software development.
We identify a core cluster representing a series of collaborative ties among members of an archaeological software engineering community of practice.
This core exhibits complexity that corresponds with social patterns, such as the presence of various sub-clusters representing interconnected interest or affinity groups.
Indeed, we demonstrate that real-world social connections and institutional support structures are strong predictors of centrality, since these sub-clusters are representative of established professional partnerships.
This indicates that archaeological open source is firmly embedded within existing power structures that permeate academic life, both online and offline.

Overall, we found that the vast majority of projects lack any collective effort and activity tends to abruptly end shortly after work is initiated. Projects that are maintained by multiple contributors tend to be backed by funded initiatives who employ locally and socially engaged individuals, or are warranted by participation in a scholarly community exhibiting genuine need for a particular kind of resource. Moreover, we found that the individuals who play critical roles in supporting the archaeological open source community are precariously employed workers. Contrary to popular claims about open source being inherently distributed, resilient, and open-ended, we found that, overall, archaeological open source is actually quite centralized, fragile, and based nearly exclusively on existing professional connections and endeavours.
This core exhibits complexity that corresponds with social patterns, such as the presence of various clusters representing interconnected interest or affinity groups.
Indeed, we have found that 'real-world' social connections and institutional support structures are strong predictors of centrality, since these clusters are representative of established professional partnerships.
This suggests that archaeological open source is firmly embedded within existing power structures that permeate academic life, both online and offline.
Similarly, we found that the individuals who play critical roles in supporting the archaeological open source community are precariously employed workers.
Far from open source being inherently distributed, resilient, and open-ended, this indicates that research software engineering is actually quite centralized, fragile, and based heavily on existing professional connections and endeavours.

There is thus little evidence to support the notion that archaeologists benefit from the positive outcomes that are commonly argued to be the natural results of open source development models -- namely, greater degrees of transparency, extensibility and participatory action.
These findings call into question the notion that archaeologists benefit from the positive outcomes that are commonly argued to be the natural results of open source development models -- namely, greater degrees of extensibility and participatory action.
While opening the source code may facilitate these positive outcomes as necessary preconditional factors, we argue that this only amounts to establishing the _potential_ for people to put these values into practice.
Moreover, we argue that the objectives and circumstances that frame archaeological practice significantly influence how far archaeologists (and academics in general) are willing to push for these values, and limit the ability for archaeologists to do open source in ways that resemble more mainstream open source projects.
For instance, successful open source projects like the Linux kernel, openSSL the Firefox web browser are driven by collective and popular interest in ensuring that code remains functional, and the code base is therefore constantly in flux and bears an accumulating list of contributing members.
This significantly differs from the organizational principles that govern archaeological work, namely that a project's leaders runs the whole show from the top down and operationalize all other actors as instruments of their central directives.
For instance, successful open source projects like the Linux kernel, openSSL, or the Firefox web browser are driven by collective and popular interest in ensuring that code remains functional, and the code base is therefore constantly in flux and bears an accumulating list of contributing members.
This differs from the organizational principles that govern much archaeological work, namely where a director or directors (of a field project, research group, etc.) sets the goals and orientation of the group and commissions and manages other actors accordingly.
Moreover, archaeological projects ultimately seek to produce stable textual outcomes bearing clear delineation of authorship and that require no upkeep whatsoever.
Sustaining an open source project is simply not compatible with the factors that currently drive the momentum behind archaeological work.

Expand Down
Loading

0 comments on commit 1e49503

Please sign in to comment.