Skip to content

Latest commit

 

History

History
53 lines (36 loc) · 2.18 KB

README.md

File metadata and controls

53 lines (36 loc) · 2.18 KB

Taxing Collaborative Software Engineering

GitHub Codacy Badge

Replication package for our work on "Taxing Collaborative Software Engineering"

Requirements

This replication package requires Python 3.10 or higher. Install the dependencies via:

python3 -m pip install -r requirements.txt

For a faster loading, we recommend to optionally install orjson via pip:

python3 -m pip install orjson

How to run

Step 1: Crawl

First, we collect all timelines from all pull requests at a GitHub instance. crawler.py requires an <api_token> for your GitHub instance and an <out_dir> where the results are stored into:

python3 crawl.py <api_token> <out_dir>

crawl.py also provides the following optional command line arguments:

  • --api_url for the GitHub instance URL (default: https://api.github.com)
  • --disable_cache for disable caching (for larger instances not recommended)
  • --num_workers for parallel processes (default: 1)
  • --organization for limiting to one organization (helpful for organizations hosted on github.com)

To list all options in detail, run:

python3 crawl.py --h

Step 2: Model pull requests as cross-border communication channels

For this step, you will need:

  1. The directory of the previously collected data; and,
  2. A mapping of users and countries. This can be either a dict for a static mapping (does not capture changes in the users' location over time) or a dataframe for time-dependent mapping as data frame monthly sampled (captures changes in the users' location over time).

Run notebook.ipynb. Look out for the instructions as inline comments.

License

Copyright © 2023 Michael Dorner.

This work is licensed under MIT license.