Use GHArchive and the GitHub API to get all the contributors to a repo.
The minimum permissions this service requires:
- Cloud Run Invoker
- BigQuery Job User
Create a service account with these roles, and associate to the service, for limited permissions.
Add ?raw=true
to the URL to return the raw data from the result.
The cli.py
allows for file-based searching for larger searches, returning raw data to stdout
virtualenv venv
source venv/bin/activate
pip install -r requirements.txt
python cli.py --repo-list list-of-repos.txt
If your repo has been renamed, provide both the new and old names. The new name will not appear in pre-rename gharchive data. Note that some duplication may occur in api/file records when using raw
.
For continuous deployment, deploy the service with Cloud Buildpacks.
For testing, change the YEARMONTH
value in app.py
to a shorter range (rather than process the entire archive.)