Skip to content

This script is intended to crawl license information of repositories through the GitHub API.

License

Notifications You must be signed in to change notification settings

schutera/GithubLicenseCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GithubLicenseCrawler

This script is intended to crawl license information of repositories through the GitHub API. Taking a csv file with requirements.txt format the script will return a csv with the associated license information.

Input

Input file is expected to be a requirements.txt Expected format looks like this, for two exemplary repositories:

HeartSeg-Dataset==0.0.1
DeepDive==0.0.1

Output

Output file will be generated on the fly, named licenses.csv and the columns depict:

| Repo name | Repo Url | License name | License Url |

image

Running the script should look like this:

image

Contact and Contribute

[email protected] Obviously the Github API is way more powerful than what has been done here. Feel free to extend this code or preferably directly contribute here.

Future work can include..

.. a function to input for what purpose you want to use your own project, which then highlights packages with conflicting licenses.

.. a function that recurrently walks through the licenses of the packages you included, and the ones they included, and the ones they included, and so on.

About

This script is intended to crawl license information of repositories through the GitHub API.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages