The GitHub Extractor package is a Python library designed to facilitate the extraction of data from GitHub.
This package provides functions to fetch information about repositories, including languages used, releases, contributors, topics, workflows, and more with robust error handling and configuration support.
- List organizations for a user from GitHub.
- List repositories for a user from GitHub.
- List repositories for a specified organization from GitHub.
- Support for authentication using GitHub API tokens.
- Filtering of organizations and repositories based on given patterns.
- Pagination handling for API requests.
You can install GitHub Extractor via pip:
pip install wolfsoftware.github-extractor
You an get basic information relating to the given token.
There is also a specific command line tool for this Github Token Validator.
from wolfsoftware.github_extractor import get_token_information
config = {
"token": "your_github_token",
}
Parameters
Name | Required | Purpose |
---|---|---|
token | Yes | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
You an get basic information relating to the authenticated user (owner of the token). The information will be limited by the scope of the token.
from wolfsoftware.github_extractor import get_authenticated_user
config = {
"token": "your_github_token",
}
Parameters
Name | Required | Purpose |
---|---|---|
token | Yes | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
You can list organizations that you are a member of using British or American English spelling.
from wolfsoftware.github_extractor import list_organisations, list_organizations
config = {
"token": "your_github_token",
"ignore_orgs": ["Test*"]
}
# Using British English spelling
organisations = list_organisations(config)
# Using American English spelling
organisations_us = list_organizations(config)
Parameters
Name | Required | Purpose |
---|---|---|
token | Yes | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
Filtering Parameters
Name | Required | Purpose |
---|---|---|
include_orgs | No | A list of organisation names to include in the results. |
ignore_orgs | No | A list of organisation names to exclude from the results. |
get_members | No | Should we include organisation members in the results. |
You can list repositories for a user with optional filters:
from wolfsoftware.github_extractor import list_user_repositories
config = {
"token": "your_github_token",
"ignore_repos": ["Test*"],
"include_repos": ["Project*"]
}
repositories = list_user_repositories(config)
Parameters
Name | Required | Purpose |
---|---|---|
token | No | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
username | No | The GitHub username to list repositories for. (Authenticated user will be used is this is not supplied). |
Additional Data Parameter
Name | Required | Purpose |
---|---|---|
get_branches | No | Add details about all branches to each repository. |
get_contributors | No | Add details about all contributors to each repository. |
get_languages | No | Add the list of identified languages for each repository. |
get_releases | No | Add details about all releases to each repository. |
get_tags | No | Add details about all tags to each repository. |
get_topics | No | Add the list of defined topics to each repository. |
get_workflows | No | Add details about all workflows to each repository. |
Filtering Parameter
Name | Required | Purpose |
---|---|---|
include_names | No | A list of repository names to include in the results. |
ignore_names | No | A list of repository names to exclude from the results. |
include_repos | No | A list of organisation names/repository names to include in the results. |
ignore_repos | No | A list of organisation names/repository names to exclude from the results. |
skip_private | No | Do not include private repositories, this is for the authenticated user only. |
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
You can list repositories for a specific organization with optional filters:
from wolfsoftware.github_extractor import list_repositories_by_org
config = {
"token": "your_github_token",
"org_name": "your_organization",
"ignore_repos": ["Test*"],
"include_repos": ["Project*"]
}
repositories = list_repositories_by_org(config)
Parameters
Name | Required | Purpose |
---|---|---|
token | No | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
org_name | No | The GitHub organisation to list repositories for. |
Additional Data Parameter
Name | Required | Purpose |
---|---|---|
get_branches | No | Add details about all branches to each repository. |
get_contributors | No | Add details about all contributors to each repository. |
get_languages | No | Add the list of identified languages for each repository. |
get_releases | No | Add details about all releases to each repository. |
get_tags | No | Add details about all tags to each repository. |
get_topics | No | Add the list of defined topics to each repository. |
get_workflows | No | Add details about all workflows to each repository. |
Filtering Parameter
Name | Required | Purpose |
---|---|---|
include_names | No | A list of repository names to include in the results. |
ignore_names | No | A list of repository names to exclude from the results. |
include_repos | No | A list of organisation names/repository names to include in the results. |
ignore_repos | No | A list of organisation names/repository names to exclude from the results. |
skip_private | No | Do not include private repositories, this is for the authenticated user only. |
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
You can list all repositories for all organisations you're a member of.
from wolfsoftware.github_extractor import list_all_org_repositories
config = {
"token": "your_github_token",
"ignore_repos": ["Test*"],
"include_repos": ["Project*"]
}
repositories = list_all_org_repositories(config)
Parameters
Name | Required | Purpose |
---|---|---|
token | Yes | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
Additional Data Parameter
Name | Required | Purpose |
---|---|---|
get_branches | No | Add details about all branches to each repository. |
get_contributors | No | Add details about all contributors to each repository. |
get_languages | No | Add the list of identified languages for each repository. |
get_releases | No | Add details about all releases to each repository. |
get_tags | No | Add details about all tags to each repository. |
get_topics | No | Add the list of defined topics to each repository. |
get_workflows | No | Add details about all workflows to each repository. |
Filtering Parameter
Name | Required | Purpose |
---|---|---|
include_names | No | A list of repository names to include in the results. |
ignore_names | No | A list of repository names to exclude from the results. |
include_repos | No | A list of organisation names/repository names to include in the results. |
ignore_repos | No | A list of organisation names/repository names to exclude from the results. |
skip_private | No | Do not include private repositories, this is for the authenticated user only. |
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
You can list repositories that you are able to access.
from wolfsoftware.github_extractor import list_all_visible_repositories
config = {
"token": "your_github_token",
"ignore_repos": ["Test*"],
"include_repos": ["Project*"]
}
repositories = list_all_visible_repositories(config)
Parameters
Name | Required | Purpose |
---|---|---|
token | Yes | Authentication for the GitHub API. |
timeout | No | The timeout to use when talking to the GitHub API (default is 10 seconds). |
slugs | No | Should we return the results as slugs. (List of names and nothing else). |
Additional Data Parameter
Name | Required | Purpose |
---|---|---|
get_branches | No | Add details about all branches to each repository. |
get_contributors | No | Add details about all contributors to each repository. |
get_languages | No | Add the list of identified languages for each repository. |
get_releases | No | Add details about all releases to each repository. |
get_tags | No | Add details about all tags to each repository. |
get_topics | No | Add the list of defined topics to each repository. |
get_workflows | No | Add details about all workflows to each repository. |
Filtering Parameter
Name | Required | Purpose |
---|---|---|
include_names | No | A list of repository names to include in the results. |
ignore_names | No | A list of repository names to exclude from the results. |
include_repos | No | A list of organisation names/repository names to include in the results. |
ignore_repos | No | A list of organisation names/repository names to exclude from the results. |
skip_private | No | Do not include private repositories, this is for the authenticated user only. |
ignore and include names use the full name of the repository, which is the organisation name / repository name E.g. GitHubToolbox/github-extractor-package
The following custom exceptions are used:
Name | Purpose |
---|---|
AuthenticationError | Raised when authentication fails. This is caused by an invalid token. |
MissingOrgNameError | Raised when the organization name is missing. |
MissingTokenError | Raised when the GitHub API token is missing but is required. |
NotFoundError | Raised when a requested resource is not found. This is caused by incorrect scope of the token. |
RateLimitExceededError | Raised when the GitHub API rate limit is exceeded. |
RequestError | Raised for general request errors. |
RequestTimeoutError | Raised when a request times out. |