Skip to content
This repository was archived by the owner on Dec 31, 2024. It is now read-only.

Analyze Github api #2

Closed
Max-Levitskiy opened this issue Dec 19, 2019 · 7 comments
Closed

Analyze Github api #2

Max-Levitskiy opened this issue Dec 19, 2019 · 7 comments

Comments

@Max-Levitskiy
Copy link
Member

No description provided.

@Max-Levitskiy
Copy link
Member Author

Query for collect all the repository of github user by login (v4 api):


query user_repositories {
  user(login: "Max-Levitskiy") {
    id
    organizations(first: 100) {
      nodes {
        repositories(first: 100) {
          totalCount
          nodes {
            nameWithOwner
            stargazers {
              totalCount
            }
            watchers {
              totalCount
            }
            isFork
            parent {
              nameWithOwner
            }
          }
        }
      }
    }
    repositories(first: 100) {
      totalCount
      nodes {
        nameWithOwner
        forkCount
        stargazers {
          totalCount
        }
        watchers {
          totalCount
        }
        isFork
        parent {
          nameWithOwner
        }
      }
    }
  }
}

@Max-Levitskiy
Copy link
Member Author

Api for getting statistics about contributors in the repository (v3 api):
https://developer.github.com/v3/repos/statistics/

@Max-Levitskiy
Copy link
Member Author

Collection statistics about user's commits from git side (without any api):
By commits with amount of changed files and added-removed lines:
git log --author="<authorname>" --oneline --shortstat
By changes in each file:
git log --author="<authorname>" --pretty=tformat: --numstat

https://stackoverflow.com/questions/1265040/how-to-count-total-lines-changed-by-a-specific-author-in-a-git-repository

@Max-Levitskiy
Copy link
Member Author

We need to have two entry points for GitHub: repository or user.
For the repository, we store stars, watches, forks, contributors list, timestamp information was last time grabbed from the GitHub API.
For users, repositories they contribute and stats: amount of commits, amount of lines added, deleted.

@Max-Levitskiy
Copy link
Member Author

Max-Levitskiy commented Dec 21, 2019

The algorithm we can scan repositories.

  • We get a request about repository users' imprint.
  • If we have calculated data, we give it sorted by imprint with pagination.
  • If we don't have information about the repository, we answer empty user list, 0% scan finished date the last scan finished - 0
  • If we have a scanning process in progress, we return how much % it finished.

Our scanning process starts from request GET /repos/:owner/:repo/stats/contributors frrom git api.
After that, we've got a list of users with their summary of a contribution joined by weeks.
We add each user from this list in a queue.
The summary stat about the number of users for repository written as well.
When we process user, we summarize the number of commits, the line added and line deleted for all weeks, save this information and decrease number of records need to be processed.
While processing is going we answer to API requests with users we have in the moment of request with the percentage of processed users.

@Max-Levitskiy
Copy link
Member Author

The initial formulas for imprint:
rc/uc + ra/ua + (rd/ud)*0.5 = rui
ri = stars + follows + forks
ri * (sri / rui) - amount imprint user get from repository contribution
owner get summary imprint from all repositories he created.

rc = total repository commits
uc = total user commits
ra = total repository added lines
ua = total user added lines
rd = total repository deleted lines
ud = total user deleted lines
ri = total repisotory imprint
sri = summury of imprint from all users in a repository
tui = total user imprint
rui = user imprint for a repository

@Max-Levitskiy
Copy link
Member Author

Github API has 5000 requests per hour limit.
https://developer.github.com/v3/#rate-limiting

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant