Skip to content

Latest commit

 

History

History
35 lines (28 loc) · 2.35 KB

ReadMe.md

File metadata and controls

35 lines (28 loc) · 2.35 KB

GitLabInstancesDataset.py Unlicensed work

wheel (GitLab) wheel (GHA via nightly.link) GitLab Build Status GitLab Coverage GitHub Actions N∅ hard dependencies Libraries.io Status Code style: antiflash

A dataset of standalone GitLab instances to determine if the URI is hosten on GitLab without probing it.

While this is a python package, the actual version of the dataset in txt format can be downloaded by the URI https://raw.githubusercontent.com/prebuilder/GitLabInstancesDataset/master/GitLabInstancesDataset/KnownGitLabInstances.txt and used from any language you like.

How to used

  1. Parse URI, extract its domain.
  2. Check if we can guess if the service is GitLab from its domain name or URI only. If we can, it is not to be included into the dataset. a. Check if the domain contains the substring gitlab. b. Check if the path contains the substring gitlab.
  3. Check the name againsrt the dataset. In python it can be done using isGitLab(domainName) function.

Inclusion criteria

  • Neither domain name nor the path to the actual service contains the substring gitlab.

  • The dataset contains domain names only. Don't send URIs here.

  • The domain names in the list must be

    • normalized to lower case
    • unique
    • sorted
    • must not contain any empty components
  • LF line ending

  • no IDNs currently