Skip to content

Latest commit

 

History

History
35 lines (22 loc) · 1.31 KB

README.md

File metadata and controls

35 lines (22 loc) · 1.31 KB

GithubLicenseCrawler

This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.txt format the script will return a csv with the associated license information.

Input

Input file is expected to be a requirements.txt Expected format looks like this, for two exemplary repositories:

HeartSeg-Dataset==0.0.1
DeepDive==0.0.1

Output

Output file will be generated on the fly, named licenses.csv and the columns depict:

| Repo name | Repo Url | License name | License Url |

image

Running the script should look like this:

image

Contact and Contribute

[email protected] Obviously the Github API is way more powerful than what has been done here. Feel free to extend this code or preferably directly contribute here.

Future work can include..

.. a function to input for what purpose you want to use your own project, which then highlights packages with conflicting licenses.

.. a function that recurrently walks through the licenses of the packages you included, and the ones they included, and the ones they included, and so on.