Skip to content

Extract various information from the GitHub API.

License

Notifications You must be signed in to change notification settings

GitHubToolbox/github-extractor-package

Repository files navigation

GitHubToolbox logo
Github Build Status License Created
Release Released Commits since release

Overview

The GitHub Extractor package is a Python library designed to facilitate the extraction of data from GitHub.

This package provides functions to fetch information about repositories, including languages used, releases, contributors, topics, workflows, and more with robust error handling and configuration support.

Features

  • List organizations for a user from GitHub.
  • List repositories for a user from GitHub.
  • List repositories for a specified organization from GitHub.
  • Support for authentication using GitHub API tokens.
  • Filtering of organizations and repositories based on given patterns.
  • Pagination handling for API requests.

Installation

You can install GitHub Extractor via pip:

pip install wolfsoftware.github-extractor

Usage

Getting Token information

You an get basic information relating to the given token.

There is also a specific command line tool for this Github Token Validator.

from wolfsoftware.github_extractor import get_token_information

config = {
    "token": "your_github_token",
}
Parameters
Name Required Purpose
token Yes Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).

Getting User Information

You an get basic information relating to the authenticated user (owner of the token). The information will be limited by the scope of the token.

from wolfsoftware.github_extractor import get_authenticated_user

config = {
    "token": "your_github_token",
}
Parameters
Name Required Purpose
token Yes Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).

Listing Organizations

You can list organizations that you are a member of using British or American English spelling.

from wolfsoftware.github_extractor import list_organisations, list_organizations

config = {
    "token": "your_github_token",
    "ignore_orgs": ["Test*"]
}

# Using British English spelling
organisations = list_organisations(config)

# Using American English spelling
organisations_us = list_organizations(config)
Parameters
Name Required Purpose
token Yes Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).
Filtering Parameters
Name Required Purpose
include_orgs No A list of organisation names to include in the results.
ignore_orgs No A list of organisation names to exclude from the results.
get_members No Should we include organisation members in the results.

Listing User Repositories

You can list repositories for a user with optional filters:

from wolfsoftware.github_extractor import list_user_repositories

config = {
    "token": "your_github_token",
    "ignore_repos": ["Test*"],
    "include_repos": ["Project*"]
}

repositories = list_user_repositories(config)
Parameters
Name Required Purpose
token No Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).
username No The GitHub username to list repositories for. (Authenticated user will be used is this is not supplied).
Additional Data Parameter
Name Required Purpose
get_branches No Add details about all branches to each repository.
get_contributors No Add details about all contributors to each repository.
get_languages No Add the list of identified languages for each repository.
get_releases No Add details about all releases to each repository.
get_tags No Add details about all tags to each repository.
get_topics No Add the list of defined topics to each repository.
get_workflows No Add details about all workflows to each repository.
Filtering Parameter
Name Required Purpose
include_names No A list of repository names to include in the results.
ignore_names No A list of repository names to exclude from the results.
include_repos No A list of organisation names/repository names to include in the results.
ignore_repos No A list of organisation names/repository names to exclude from the results.
skip_private No Do not include private repositories, this is for the authenticated user only.

ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package

Listing Repositories by Organization

You can list repositories for a specific organization with optional filters:

from wolfsoftware.github_extractor import list_repositories_by_org

config = {
    "token": "your_github_token",
    "org_name": "your_organization",
    "ignore_repos": ["Test*"],
    "include_repos": ["Project*"]
}

repositories = list_repositories_by_org(config)
Parameters
Name Required Purpose
token No Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).
org_name No The GitHub organisation to list repositories for.
Additional Data Parameter
Name Required Purpose
get_branches No Add details about all branches to each repository.
get_contributors No Add details about all contributors to each repository.
get_languages No Add the list of identified languages for each repository.
get_releases No Add details about all releases to each repository.
get_tags No Add details about all tags to each repository.
get_topics No Add the list of defined topics to each repository.
get_workflows No Add details about all workflows to each repository.
Filtering Parameter
Name Required Purpose
include_names No A list of repository names to include in the results.
ignore_names No A list of repository names to exclude from the results.
include_repos No A list of organisation names/repository names to include in the results.
ignore_repos No A list of organisation names/repository names to exclude from the results.
skip_private No Do not include private repositories, this is for the authenticated user only.

ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package

Listing all Organisation Repositories

You can list all repositories for all organisations you're a member of.

from wolfsoftware.github_extractor import list_all_org_repositories

config = {
    "token": "your_github_token",
    "ignore_repos": ["Test*"],
    "include_repos": ["Project*"]
}

repositories = list_all_org_repositories(config)
Parameters
Name Required Purpose
token Yes Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).
Additional Data Parameter
Name Required Purpose
get_branches No Add details about all branches to each repository.
get_contributors No Add details about all contributors to each repository.
get_languages No Add the list of identified languages for each repository.
get_releases No Add details about all releases to each repository.
get_tags No Add details about all tags to each repository.
get_topics No Add the list of defined topics to each repository.
get_workflows No Add details about all workflows to each repository.
Filtering Parameter
Name Required Purpose
include_names No A list of repository names to include in the results.
ignore_names No A list of repository names to exclude from the results.
include_repos No A list of organisation names/repository names to include in the results.
ignore_repos No A list of organisation names/repository names to exclude from the results.
skip_private No Do not include private repositories, this is for the authenticated user only.

ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package

Listing all Visible Repositories

You can list repositories that you are able to access.

from wolfsoftware.github_extractor import list_all_visible_repositories

config = {
    "token": "your_github_token",
    "ignore_repos": ["Test*"],
    "include_repos": ["Project*"]
}

repositories = list_all_visible_repositories(config)
Parameters
Name Required Purpose
token Yes Authentication for the GitHub API.
timeout No The timeout to use when talking to the GitHub API (default is 10 seconds).
slugs No Should we return the results as slugs. (List of names and nothing else).
Additional Data Parameter
Name Required Purpose
get_branches No Add details about all branches to each repository.
get_contributors No Add details about all contributors to each repository.
get_languages No Add the list of identified languages for each repository.
get_releases No Add details about all releases to each repository.
get_tags No Add details about all tags to each repository.
get_topics No Add the list of defined topics to each repository.
get_workflows No Add details about all workflows to each repository.
Filtering Parameter
Name Required Purpose
include_names No A list of repository names to include in the results.
ignore_names No A list of repository names to exclude from the results.
include_repos No A list of organisation names/repository names to include in the results.
ignore_repos No A list of organisation names/repository names to exclude from the results.
skip_private No Do not include private repositories, this is for the authenticated user only.

ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package

Exceptions

The following custom exceptions are used:

Name Purpose
AuthenticationError Raised when authentication fails. This is caused by an invalid token.
MissingOrgNameError Raised when the organization name is missing.
MissingTokenError Raised when the GitHub API token is missing but is required.
NotFoundError Raised when a requested resource is not found. This is caused by incorrect scope of the token.
RateLimitExceededError Raised when the GitHub API rate limit is exceeded.
RequestError Raised for general request errors.
RequestTimeoutError Raised when a request times out.