diff --git a/_data/sidebars/home_sidebar.yml b/_data/sidebars/home_sidebar.yml index a7d9c515..4484b8e5 100644 --- a/_data/sidebars/home_sidebar.yml +++ b/_data/sidebars/home_sidebar.yml @@ -46,6 +46,50 @@ entries: url: /basics/dockerhub.html output: web + - title: Architecture + output: web + subfolderitems: + + - title: Overview + url: /architecture/architecture.html + output: web + + - title: Components + url: /architecture/components.html + output: web + + - title: Perceval + url: /architecture/perceval.html + output: web + + - title: Graal + url: /architecture/graal.html + output: web + + - title: King Arthur + url: /architecture/kingarthur.html + output: web + + - title: SortingHat & HatStall + url: /architecture/sortinghatstall.html + output: web + + - title: ELK & Cereslib + url: /architecture/elkceres.html + output: web + + - title: Manuscripts + url: /architecture/manuscripts.html + output: web + + - title: Sigils, Kidash & Kibiter + url: /architecture/dashboards.html + output: web + + - title: Mordred + url: /architecture/mordred.html + output: web + - title: Perceval output: web subfolderitems: diff --git a/architecture/architecture.md b/architecture/architecture.md new file mode 100644 index 00000000..071a6682 --- /dev/null +++ b/architecture/architecture.md @@ -0,0 +1,67 @@ +# Overview + +The overall structure of GrimoireLab is summarized in the figure below. Its core is composed of four components that take care of: extracting software development data, storing it, managing contributor identities and analyzing (visualizing) the data obtained. Additionally, orchestration of all the components is also available. The details of each component are described in the next sections. + +![](../assets/grimoirelab-all.png) + +*Overview of GrimoireLab. Software data is extracted, processed and visualized via four components, with one of them dedicated to managing identities. A further component allows to perform analysis over a set of target data sources by setting up and orchestrating together the other components.* +## Data retrieval + +Data retrieval is handled by three tools. + +- [Perceval](https://github.com/chaoss/grimoirelab-perceval) is designed to deal only with fetching the data, so that it can be optimized for that task. Usually, it works as a library, providing a uniform Python API to access software development repositories. + +- [Graal](https://github.com/chaoss/grimoirelab-graal) complements Perceval by collecting data from the source code of Git repositories. It provides a mechanism to plug in third party source code analysis tools and libraries. + +- [King Arthur](https://github.com/chaoss/grimoirelab-kingarthur) schedules and runs Perceval and Graal executions at scale through distributed queues. + +## Identities management + +Identity management is a component that handles contributors information and enables analysis where identities, organizations and affiliations are first-class citizens. + +Depending on the kind of repository from which the data is retrieved, identities can come in different formats: commit signatures (e.g., full names and email addresses) in Git repositories, email addresses, GitHub or Slack usernames, etc. Furthermore, a given person may use several identities even in the same data source, and in different kinds of data sources. In some cases, an identity can be shared by several contributors (e.g., support email addresses in forums). + +SortingHat and Hatstall are the tools used in GrimoireLab for managing identities. + +- [Sortinghat](https://github.com/chaoss/grimoirelab-sortinghat) maintains a relational database with identities and related information, including the origin of each identity, for traceability. In the usual pipeline, the data storage component feeds SortingHat with identities found in data sources. SortingHat uses heuristics-based algorithms to merge identities data, and sends the unified data back to the data storage component. +- [HatStall](https://github.com/chaoss/grimoirelab-hatstall) is a web application, which provides an intuitive graphical interface to perform operations over a SortingHat database. + +## Data storage + +GrimoireLab pipelines usually involve storing the retrieved data, with two main goals: allowing for repeating analysis without needing new data retrieval, and having pre-processed data available, suitable for visualization and analytics. For the first goal, a raw database is maintained, with a copy of all JSON documents produced by Perceval. For the second, an enriched database, which is a flattened summary of the raw documents, is produced and stored. +The tools of the data storage components are ELK and Cereslib, they are described below: + +- [ELK](https://github.com/chaoss/grimoirelab-elk) is the tool interacting with the database. The design underlying ELK consists of a feeder that collects the JSON documents produced by the data retrieval component. Next, the documents are stored as the raw database. Dumps of this raw data can be easily created to make any analysis reproducible, or just to conveniently perform the analytics with other technologies beyond the ones provided by GrimoireLab. + + Then, the raw data is enriched by including identities information and attributes not directly available in the original data. For example, pair programming information is added to Git data, when it can be extracted from commit messages, or time to solve (i.e., close or merge) an issue or a pull request is added to the GitHub data. The data obtained is finally stored as flat JSON documents, embedding references to the raw documents for traceability. + +- [Cereslib](https://github.com/chaoss/grimoirelab-cereslib) is a tool aims at simplifying data processing and it is tailored to GrimoireLab data. It provides interfaces to access ELK raw and enriched data and manipulate it (e.g., split and filter) to produce additional insights about software development. + +## Analytics + +The analytics component is in charge of presenting the data via static reports and dynamic dashboards. The tools participating in the generation of such artifacts are described below. + +- **Reports** are generated by [Manuscripts](https://github.com/chaoss/grimoirelab-manuscripts), a tool that queries the GrimoireLab data storage and produces template-based documents, which ready to be delivered to decision-makers, able in this way to easily identify relevant aspects of their projects. + +- **Dashboards** creation involve three tools: + + - [Sigils](https://github.com/chaoss/grimoirelab-sigils) is a set of predefined widgets (e.g., visualizations and charts) available as JSON documents. + + - [Kidash](https://github.com/chaoss/grimoirelab-kidash) is a tool able to import and export widgets to Kibiter. + + - [Kibiter](https://github.com/chaoss/grimoirelab-kibiter) is a downstream of [Kibana](https://github.com/elastic/kibana) which performs the binding between the Sigils widgets and the GrimoireLab data, thus providing web-based dashboards for actionable inspection, drill down, and filtering of the software development data retrieved. + +## Orchestration + +The orchestration component takes care of coordinating the process leading to the dashboards. + +[SirMordred](https://github.com/chaoss/grimoirelab-sirmordred) is the tool which enables the user to easily run GrimoireLab to retrieve data from software repositories, produce raw and enriched data, load predefined widgets and generate dashboards. + +SirMordred relies on the `setup.cfg` and `projects.json` files, which have been designed to keep separated sensitive data from the one that can be publicly shared. Thus, the setup file includes credentials and tokens to access the GrimoireLab components and software repositories, while the projects file includes the information about the projects to analyse. Both files are +described below. + +- `Setup.cfg` holds the configuration to arrange all process underlying GrimoireLab. It composed of sections which allow to define the general settings such as which components activate and where to store the logs, as well as the location and credentials +for ELK, SortingHat and Kibiter which can be protected to prevent undesired accesses. Furthermore, it also includes other sections to set up the parameters used by the data retrieval component to access the software repositories (e.g., GitHub tokens, gerrit username) and fetch their data. + +- `Projects.json` enables the users to list the projects to analyse, divided by data sources such Git repositories, GitHub and GitLab issue trackers and Slack channels. +Furthermore, it also allows to add some meta information to group projects together, which structure is reflected in the dashboards. \ No newline at end of file diff --git a/architecture/components.md b/architecture/components.md new file mode 100644 index 00000000..33ff8186 --- /dev/null +++ b/architecture/components.md @@ -0,0 +1,14 @@ +# Components + +This section provides details about the GrimoireLab tools, which are highlithed in the figure below. + +![](../assets/grimoirelab-all-details.png) + +- [Perceval](./perceval.html) +- [Graal](./graal.html) +- [King Arthur](./kingarthur.html) +- [SortingHat & HatStall](./sortinghatstall.html) +- [ELK and Cereslib](./elkceres.html) +- [Manuscripts](./manuscripts.html) +- [Sigils, Kidash & Kibiter](./dashboards.html) +- [Mordred](./mordred.html) \ No newline at end of file diff --git a/architecture/dashboards.md b/architecture/dashboards.md new file mode 100644 index 00000000..30404ce4 --- /dev/null +++ b/architecture/dashboards.md @@ -0,0 +1 @@ +TODO \ No newline at end of file diff --git a/architecture/elkceres.md b/architecture/elkceres.md new file mode 100644 index 00000000..30404ce4 --- /dev/null +++ b/architecture/elkceres.md @@ -0,0 +1 @@ +TODO \ No newline at end of file diff --git a/architecture/graal.md b/architecture/graal.md new file mode 100644 index 00000000..0a1e189a --- /dev/null +++ b/architecture/graal.md @@ -0,0 +1,115 @@ +## Graal + +Graal complements the data extracted with Perceval by providing insights about the source code (e.g, code complexity, licenses). + +Graal leverages on the incremental functionalities provided by Perceval and enhances the logic to handle Git repositories +to process their source code. The overall view of Graal and its connection with Perceval is summarized in the figure below: the Git backend creates a local mirror of a Git repository (local or +remote) and fetches its commits in chronological order. Several parameters are available to control the execution; for instance, *from_date* and *to_date* allow to select commits authored since +and before a given date, *branches* allows to fetch commits only from specific branches, and *latest_items* returns only those commits which are new since the last fetch operation. + +Graal extends the Git backend by enabling the creation of a working tree (and its pruning), that allows to perform checkout operations which are not possible on a Git mirror. Furthermore, +it also includes additional parameters used to drive the analysis to filter in/out files and directories in the repository (*in_paths* and *out_paths*), set the *entrypoint* and define the *details* level +of the analysis (useful when analyzing large software projects). + +![](../assets/graal.png) +*Overview of Graal* + +Following the philosophy of Perceval, the output of the Git backend execution is a list of JSON documents (one per +commit). Therefore, Graal intercepts each document, replaces some metadata information (e.g., backend name, category) and +enables the user to perform the following steps: (i) filter, (ii) +analyze and (iii) post-process, which are described below. + +- **Filter.** + The filtering is used to select or discard commits based on the information available in the JSON document + and/or via the Graal parameters (e.g., the commits authored by a given user or targeting a given software component). + For any selected commit, Graal executes a checkout on the working tree using the commit hash, thus setting the state of + the working tree at that given revision. The filtering default built-in behavior consists in selecting all commits. + +- **Analyze.** + The analysis takes the document and the current working tree and enables the user to set up an ad-hoc + source code analysis by plugging existing tools through system calls or their Python interfaces, when possible. The results of + the analysis are parsed and manipulated by the user and then automatically embedded in the JSON document. In this step, + the user can rely on some predefined functionalities of Graal to deal with the repository snapshot (e.g., listing files, creating + archives). By default, this step does not perform any analysis, thus the input document is returned as it is. + +- **Post-process.** + In the final step, the inflated JSON document can be optionally processed to alter (e.g., renaming, removing) its attributes, thus granting the user complete control + over the output of Graal executions. The built-in behavior of this step keeps all attributes as they are. + +### Backends +Several backends have been developed to assess the genericity of Graal. Those backends leverage on source code analysis +tools, where executions are triggered via system calls or their Python interfaces. In the current status, the backends +mostly target Python code, however other backends can be easily developed to cover other programming languages. The +currently available backends are: +- **CoCom** gathers data about code complexity (e.g., cyclomatic complexity, LOC) from projects written in popular programming languages such as: C/C++, Java, Scala, JavaScript, Ruby, Python, Lua and Golang. It leverages on [Cloc](http://cloc.sourceforge.net/) and [Lizard](https://github.com/terryyin/lizard). The tool can be exectued at `file` and `repository` levels activated with the help of category: `code_complexity_lizard_file` or `code_complexity_lizard_repository`. +- **CoDep** extracts package and class dependencies of a Python module and serialized them as JSON structures, composed of edges and nodes, thus easing the bridging with front-end technologies for graph visualizations. It combines [PyReverse](https://pypi.org/project/pyreverse/) and [NetworkX](https://networkx.github.io/). +- **CoQua** retrieves code quality insights, such as checks about line-code’s length, well-formed variable names, unused imported modules and code clones. It uses [PyLint](https://www.pylint.org/) and [Flake8](http://flake8.pycqa.org/en/latest/index.html). The tools can be activated by passing the corresponding category: `code_quality_pylint` or `code_quality_flake8`. +- **CoVuln** scans the code to identify security vulnerabilities such as potential SQL and Shell injections, hard-coded passwords and weak cryptographic key size. It relies on [Bandit](https://github.com/PyCQA/bandit). +- **CoLic** scans the code to extract license & copyright information. It currently supports [Nomos](https://github.com/fossology/fossology/tree/master/src/nomos) and [ScanCode](https://github.com/nexB/scancode-toolkit). They can be activated by passing the corresponding category: `code_license_nomos`, `code_license_scancode`, or `code_license_scancode_cli`. +- **CoLang** gathers insights about code language distribution of a git repository. It relies on [Linguist](https://github.com/github/linguist) and [Cloc](http://cloc.sourceforge.net/) tools. They can be activated by passing the corresponding category: `code_language_linguist` or `code_language_cloc`. + +## Graal in action +This section describes how to install and use Graal, highlighting its main features. + +### Installation + +Graal is being developed and tested mainly on GNU/Linux platforms. Thus it is very likely it will work out of the box +on any Linux-like (or Unix-like) platform, upon providing the right version of Python. The listing below shows how to install and uninstall Graal on your system. Currently, the only way of installing Graal consists of cloning the GitHub repository +hosting the [tool](https://github.com/chaoss/grimoirelab-graal) and using the setup script, while uninstalling the tool can be easily achieved by relying on *pip*. + +```bash +To install, run: +git clone https://github.com/valeriocos/graal +python3 setup.py build +python3 setup.py install +To uninstall, run: +pip3 uninstall graal +``` + +### Use + +Once installed, Graal can be used as a stand-alone program or Python library. We showcase these two types of executions below. + +#### Stand-alone program +Using Graal as stand-alone program does not require much effort, but only some basic knowledge of GNU/Linux shell commands. The listing below shows +how easy it is to fetch code complexity information from a Git repository. As can be seen, the CoCom backend requires the URL where the repository is located (https://github.com/chaoss/grimoirelab-perceval) and the local path where to +mirror the repository (/tmp/graal-cocom). Then, the JSON documents produced are redirected to the file graal-cocom.test. The remaining messages in the listing are prompted to the user +during the execution. + +Interesting optional arguments are *from-date*, which is inherited from Perceval and allows to fetch commits from a given date, *worktree-path* which sets the path of the working tree, +and *details* which enables fine-grained analysis by returning complexity information for methods/functions. + +```bash +graal cocom https://github.com/chaoss/grimoirelab-perceval --git-path /tmp/graal-cocom > /graal-cocom.test +[2018-05-30 18:22:35,643] - Starting the quest for the Graal. +[2018-05-30 18:22:39,958] - Git worktree /tmp/... created! +[2018-05-30 18:22:39,959] - Fetching commits: ... +[2018-05-31 04:51:56,111] - Git worktree /tmp/... deleted! +[2018-05-31 04:51:56,112] - Fetch process completed: ... +[2018-05-31 04:51:56,112] - Quest completed. +``` + +#### Python Library +Graal’s functionalities can be embedded in Python scripts. Again, the effort of using Graal is minimum. In this case the user only needs some knowledge of Python +scripting. The listing below shows how to use Graal in a script. The graal.backends.core.cocom module is imported at the beginning of the file, then the repo uri and repo dir variables +are set to the URI of the Git repository and the local path where to mirror it. These variables are used to initialize a CoCom class object. In the last line of the script, the commits +inflated with the result of the analysis are retrieved using the fetch method. The fetch method inherits its argument from Perceval, thus it optionally accepts two Datetime objects to +gather only those commits after and before a given date, a list of branches to focus on specific development activities, and a flag to collect the commits available after the last execution. + +```python +#! /usr/bin/env python3 +from graal.backends.core.cocom import CoCom + +# URL for the git repo to analyze +repo_uri = ’http://github.com/chaoss/grimoirelab-perceval’ +# directory where to mirror the repo +repo_dir = ’/tmp/graal-cocom’ + +# Cocom object initialization +cc = CoCom(uri=repo_url, gitpath=repo_dir) +# fetch all commits +commits = [commit for commit in cc.fetch()] +``` + +## Example +TODO \ No newline at end of file diff --git a/architecture/kingarthur.md b/architecture/kingarthur.md new file mode 100644 index 00000000..3931b626 --- /dev/null +++ b/architecture/kingarthur.md @@ -0,0 +1,100 @@ +## King Arthur + +Originally, King Arthur (or just Arthur) was designed to allow to schedule and run Perceval executions at scale through distributed Redis queues, and store the obtained results in an ElasticSearch database, thus giving the possibility to +connect the results with analysis and/or visualizations tools. The figure below highlights the overall view of Arthur. + +![](../assets/kingarthur-oneline.png) + +*Overview of Arthur* + +At its heart there are two components: the server and one or more instances of workers, in charge of running Perceval executions. The server waits for HTTP requests, which allow to add, +delete or list tasks using REST API commands (i.e., add, remove, tasks). The listing below depicts how to send commands to the Arthur server. As can be seen, adding and removing tasks +requires specific parameters, sent as JSON data within the request. Adding a task needs a JSON object that contains a task id (useful for deleting and listing operations), the +parameters needed to execute a Perceval backend, plus other optional parameters to control the scheduling (i.e., delayed start, maximum number of retries upon failures) and archive +the fetched data. Conversely, in order to remove a task, the JSON object must contain the identifier of that given task. + +```bash +# Adding tasks +$ curl -H "Content-Type: application/json" --data @to_add.json http://127.0.0.1:8080/add + +# Removing tasks +$ curl -H "Content-Type: application/json" --data @to_remove.json http://127.0.0.1:8080/remove + +# Listing tasks +$ curl http://127.0.0.1:8080/tasks +``` + +After receiving a task, the server initializes a job with the task parameters, thus enabling a link between the job and the task, and sends the job to the scheduler. The scheduler +manages two (in-memory) queues handling first-time jobs and already finished jobs that will be rescheduled. The former are Perceval executions that perform the initial gathering from +a data source, while the latter are executions launched in incremental mode (e.g., from a given date, which is by default the date when the previous execution ended). In case of +execution failures, the job is rescheduled as many times as defined in the scheduling parameters of the task. + +Workers grant Arthur with scalability support. They listen to the queues, pick up jobs and run Perceval backends. Once the latter have finished, workers notify the scheduler with the result of the execution, and in case of success, they send the +JSON documents to the server storage queue. Such documents are consumed by writers, which make possible to live-stream data or serialize it to database management systems. In the +current implementation, Arthur can store the JSON documents to an ElasticSearch database. + +### Executing Graal through Arthur +Arthur has been extended to allow handling Graal tasks, which inherit from Perceval Git tasks, thus Arthur periodically executes the method fetch of a given Graal backend. Optionally, the parameter latest items can be used to run the analysis +only on the new commits available after the last execution. + +The listings below show two examples of JSON objects to include and delete Graal tasks. As can be seen, adding a task to analyze the code complexity of a repository consists of sending an add +command to the Arthur server with a JSON object including a task id (*cocom_graal*), the parameters needed to execute an instance of the CoCom backend, such as its category (i.e., +code complexity), the URI of the target repository and the local path where it will be mirrored (i.e., *uri* and *git_path*). + +Furthermore, the task defines also the scheduler settings delay and max retries, which allow to postpone the scheduling of the corresponding job and set the maximum number of retries upon job failures before raising an exception 12 . Deleting the +cocom graal task requires less effort, it suffices to send a remove command to the Arthur server that includes a JSON object with the target task. + +```json +{ + "tasks": [ + { + "task_id": "arthur.git", + "backend": "git", + "backend_args": { + "gitpath": "/tmp/git/arthur.git/", + "uri": "https://github.com/chaoss/grimoirelab-kingarthur.git", + "from_date": "2015-03-01" + }, + "category": "commit", + "scheduler": { + "delay": 10 + } + }, + { + "task_id": "bugzilla_mozilla", + "backend": "bugzillarest", + "backend_args": { + "url": "https://bugzilla.mozilla.org/", + "from_date": "2016-09-19" + }, + "category": "bug", + "archive": { + "fetch_from_archive": true, + "archived_after": "2018-02-26 09:00" + }, + "scheduler": { + "delay": 60, + "max_retries": 5 + } + } + ] +} +``` +*Adding a Graal task* + +```json +{ + "tasks": [ + { + "task_id": "bugzilla_mozilla" + }, + { + "task_id": "arthur.git" + } + ] +} +``` +*Removing a Graal task* + +## Example +TODO diff --git a/architecture/manuscripts.md b/architecture/manuscripts.md new file mode 100644 index 00000000..30404ce4 --- /dev/null +++ b/architecture/manuscripts.md @@ -0,0 +1 @@ +TODO \ No newline at end of file diff --git a/architecture/mordred.md b/architecture/mordred.md new file mode 100644 index 00000000..30404ce4 --- /dev/null +++ b/architecture/mordred.md @@ -0,0 +1 @@ +TODO \ No newline at end of file diff --git a/architecture/perceval.md b/architecture/perceval.md new file mode 100644 index 00000000..7df6f735 --- /dev/null +++ b/architecture/perceval.md @@ -0,0 +1,162 @@ +## Perceval + +Perceval was designed with a principle in mind: do one thing and do it well. Its goal is to fetch data from data source repositories efficiently. It does not store nor analyze data, leaving these tasks to other, specific tools. Though it was conceived as a command line tool, it may be used as a Python library as well. +Perceval supports plenty of data sources, which are commonly used to support, coordinate and promote development activities, such as Git, GitHub, GitLab, Slack and so on. + +A common execution of Perceval consists of fetching a collection of homogeneous items from a given data source. For instance, issue reports are the items extracted from Bugzilla and GitHub +issues trackers, while commits and code reviews are items obtained from Git and Gerrit repositories. Each item is inflated with item-related information (e.g., comments and authors of a GitHub issue) +and metadata useful for debugging (e.g., backend version and timestamp of the execution). The output of the execution is a list of JSON documents (one per item). + +The overall view of Perceval’s approach is summarized in the figure below. At its heart there are three components: **Backend**, **Client** and **CommandLine**. + +![](../assets/perceval-json.png) + +*Overview of the approach. The user interacts with the backend through the shell command line, which depending on the data source retrieves with a specific client the data; the output is provided in form of JSON documents.* + +### Backend + +The Backend orchestrates the gathering process for a specific data source and puts in place incremental and caching mechanisms. Backends share common features, such as incrementality and caching, and define also specific ones tailored to the data source they are targeting. For instance, the GitHub backend requires an API token and the names of the repository and owner; instead the StackExchange backend needs an API token plus the tag to filter questions. + +### Client + +The backend delegates the complexities to query the data source to the Client. Similarly to backends, clients share common features such as handling possible connection problems with remote data sources, and define specific ones when needed. For instance, long lists of results fetched from GitHub and StackExchange APIs are delivered in pages (e.g., pagination), thus the corresponding clients have to take care of this technicality. + +### CommandLine + +The CommandLine allows to set up the parameters controlling the features of a given backend. Furthermore, it also provides optional arguments such as help and debug to list the backend features and enable debug mode execution, respectively. + +## Perceval in action + +This section describes how to install and use Perceval, highlighting its main features. + +### Installation + +Perceval is being developed and tested mainly on GNU/Linux platforms. Thus, it is very likely it will work out of the box on +any Linux-like (or Unix-like) platform, upon providing the right version of Python available. + +There are several ways for installing Perceval on your system: with the [pip packager manager](https://pypi.org/project/perceval/), from a [Docker image](https://github.com/chaoss/grimoirelab-perceval/tree/master/docker/images) or from the +source code. The listing below shows how to install Perceval from pip and source code. Further installation information can be found on [GitHub](https://github.com/grimoirelab/perceval). + +```bash +# Installation through pip +$ pip3 install perceval +------------------------------------------------------- +# Installation from source code +$ git clone https :// github . com / grimoirelab / perceval . git +$ pip3 install -r requirements . txt +$ python3 setup . py install +``` + +### Use + +Once installed, a Perceval backend can be used as a stand-alone program or Python library. We showcase these two types of executions by fetching data from a Git repository. + +Git is probably the most popular source code management system nowadays. Is is usually used to track versions of source code files. Transactions on a Git repositories are called commits. Each +commit is an atomic change to the files in the repository. For each commit, Git maintains data for tracking what changed, and some metadata such as who committed the change, when and which files +were affected. + +Perceval clones the Git repository to analyze, and gets information for all its commits by using the git log command under the hoods. It produces a JSON document (a dictionary when using +it from Python) for each commit. The listing below shows an excerpt of a JSON document produced. As can be seen, the document contains some item-related information (e.g., files) plus metadata included by Perceval itself (e.g., backend_name, backend_version). + +```json +{ + "backend_name": "Git", + "backend_version": "0.12.0", + "category": "commit", + "classified_fields_filtered": null, + "data": { + "Author": "Santiago Due\u00f1as ", + "AuthorDate": "Fri Jul 19 13:42:49 2019 +0200", + "Commit": "Santiago Due\u00f1as ", + "CommitDate": "Fri Jul 19 13:42:49 2019 +0200", + "commit": "dd5f1dce4a37a7941a96a1acc5ec95bf151393c5", + "files": [ + { + "action": "M", + "added": "1", + "file": "grimoirelab_toolkit/_version.py", + "indexes": [ + "390ac0c", + "ddbba5f" + ], + "modes": [ + "100644", + "100644" + ], + "removed": "1" + } + ], + "message": "Update version number to 0.1.10", + "parents": [ + "9aae92e72f2098f54b5ad572710e8d2dc46ddf49" + ], + "refs": [ + "HEAD -> refs/heads/master" + ] + }, + "origin": "https://github.com/chaoss/grimoirelab-toolkit", + "perceval_version": "0.12.23", + "search_fields": { + "item_id": "dd5f1dce4a37a7941a96a1acc5ec95bf151393c5" + }, + "tag": "https://github.com/chaoss/grimoirelab-toolkit", + "timestamp": 1570886107.24431, + "updated_on": 1563536569.0, + "uuid": "16afbc524d682f4fca498a64fbaf9a7d3690b254" +} +``` + +#### Stand-alone program + +Using Perceval as stand-alone program does not require much effort, but only some basic knowledge of GNU/Linux shell commands. The listing below shows how easy it is to +fetch commit information from a Git repository. As can be seen, the backend for Git requires the URL where the repository is located (https:// github.com/ grimoirelab/ perceval.git), then the JSON +documents produced are redirected to the file perceval.test. The remaining messages in the listing are prompted to the user during +the execution. + +One interesting optional argument is from-date, which allows to fetch commits from a given date, thus showing an example of how incremental support is easily achieved in Perceval. + +```bash +$ perceval git https://github.com/grimoirelab/perceval > /perceval.test +[2017-11-18 20:32:19 ,425] - Sir Perceval is on his quest. +[2017-11-18 20:32:19 ,427] - Fetching commits: 'https://github.com/grimoirelab/perceval' git repository from 1970-01-01 00:00:00+00:00; all branches +[2017-11-18 20:32:20 ,798] - Fetch process completed: 798 commits fetched +[2017-11-18 20:32:20 ,798] - Sir Perceval completed his quest. +``` + +#### Python library + +Perceval’s functionalities can be embedded in Python scripts. Again, the effort of using Perceval is minimum. In this case the user only needs some knowledge of Python scripting. + +The listing below shows how to use Perceval in a script. The perceval.backends module is imported at the beginning of the file, then the repo_url and repo_dir variables are set to the URL of the Git +repository and the local path where to clone it. These variables are used to initialize an object of the perceval.backends.git.Git class. In the last two lines of the script, the commits are retrieved using +the method fetch and the names of their authors printed. The fetch method, which is available in all backends, needs to be tailored to the target data source. Therefore, the Git backend fetches commits, +while GitHub and StackExchange ones fetch issues and questions. The fetch method optionally accepts a Datetime object to gather only those items in the data source modified after a given date. +When possible, the filtering of the items relies on the data source functionalities, for instance the GitHub API allows to ask for issues modified after a date. In the other cases, the filtering is implemented +in the backend itself. + +```python +#!/usr/bin/env python3 +from perceval.backends.core.git import Git + +# URL for the git repo to analyze +repo_url = 'http://github.com/grimoirelab/perceval' +# directory for letting Perceval clone the git repo +repo_dir = '/tmp/perceval.git' + +# Git object , pointing to repo_url and repo_dir for cloning +repo = Git (uri=repo_url, gitpath=repo_dir) + +# fetch all commits and print each author +for commit in repo.fetch(): + print(commit['data']['Author']) +``` + +## Examples + +### GitHub +GitHub is a popular service for hosting software development. It provides git repositories associated with issues (tickets) and pull requests (proposed patches). All this information is available via the GitHub API. We will use Perceval GitHub backend to retrieve data from this API for issues and pull requests. For git repositories we can use the Perceval git backend, as we already introduced in the previous section. +ADD CONTENT FROM https://chaoss.github.io/grimoirelab-tutorial/perceval/github.html + +### Mail archives +TODO + diff --git a/architecture/sortinghatstall.md b/architecture/sortinghatstall.md new file mode 100644 index 00000000..ac060ade --- /dev/null +++ b/architecture/sortinghatstall.md @@ -0,0 +1,157 @@ +## SortingHat + +SortingHat maintains a relational database with identities and related information extracted from different tools used +in software development (e.g., Git, GitHub, Slack). An identity is a tuple composed of a *name*, *email*, *username* and the name of the source +from where it was extracted. Tuples are converted to unique identifiers (i.e., *uuid*), which provide a quick mean to compare identities among each other. By default, SortingHat +considers all identities as unique ones. Heuristics take care to automatically merge identities based on perfect matches on (i) *uuids*, (ii) *name*, (iii) *email* or (iv) *username*. In case +of a positive match, an identity is randomly selected as the unique one, and the other identities are linked to it. + +Identities can be interactively manipulated via shell commands, which hide low-level implementation details to the user, thus decoupling the shell from the database technology +in use. Then, each command is translated to one or more API calls in charge of dealing with the database specificities. Furthermore, identities can be loaded to SortingHat via batch +files written in specific formats, thus speeding up identities imports for projects with large communities. Batch files are processed by parsers and inserted to the underlying database +via API calls. Currently, the available parsers handle the following formats: *Gitdm*, *MailMap*, *Stackalytics* and the formats used for Eclipse and Mozilla committers. + +The overall view of SortingHat’s approach is summarized in the figure below. It is composed of three components: Database, Commands and API. + +![](../assets/sortinghat.png) + +*Overview of the approach underlying SortingHat* + +### Database + +SortingHat relies on open source technologies to store and manipulate identity information. It uses MySQL for storage +and SQLAlchemy to bridge database relations into objects. The conceptual schema of the SortingHat database is shown +in the figure below. + +![](../assets/sortinghat-schema.png) + +*Overview of the SortingHat conceptual schema, where unique identities are the first-class citizens.* + +At its heart, there are the unique identities (i.e., *Uidentities*) of project members. Every unique identity can have more than one *Identities*, which are found in the +software development tools of the project. They are described by the attributes name, email and username, plus the source where it was found (e.g., Git, GitHub) and the last date when +its attributes were modified or when it was (un)linked to a unique identity (i.e., last modified ). Furthermore, a unique identity has a *Profile* and a list of *Enrollments*. A profile +includes a summary of the project member such as *name*, *email*, whether it is a bot (i.e., *is_bot*), *gender*, and optionally the *Country* information (based on the ISO 3166 standard). + +*Enrollments* express temporal relationships (i.e., start and end date attributes) between unique identities and *Organizations*. Thus, organizations and unique identities can have more +than one enrollments over time, but a given enrollment exists only for an organization and a unique identity. *Organizations* are defined by a name and can belong to different domains +(i.e., *Domains_organizations*). A domain represents affiliation relationships between organizations (i.e., *is_top_domain*). This is the case of large open source foundations like Linux +and Mozilla, where several organizations contribute to. + +Finally, organization names or identities with specific email addresses, usernames or full names can be easily excluded from SortingHat by registering their values in a *Matching +blacklist*. The filter associated to the blacklist is executed every time an identity is inserted to the database or modified. + +### Commands + +SortingHat provides more than 20 [commands](https://github.com/chaoss/grimoirelab-sortinghat) to manipulate identities data. The list below summarizes the common ones, +which involve manual and heuristic-based operations. +- Manual operations + - Add: add identities. + - Show: show information about identities. + - Profile: edit profile (e.g., update gender). + - Remove: delete an identity. + - Merge: merge unique identities. + - Move: move an identity into a unique identity. It is worth noting that when the Move operation is executed over the same identity, it allows to unmerge the identity from its unique identity. + - Orgs: list, add or delete organizations and domains. + - Enroll: enroll identities into organizations. + - Withdraw: unlink identities from organizations. +- Heuristic-based operations + - Unify: merge identities using a matching algorithm. + - Affiliate: enroll unique identities to organizations using email addresses and top/sub domains. + - Autoprofile: auto complete profiles with emails and names found on the tools used in the project. + - Autogender: auto complete gender information using the [genderize.io](https://genderize.io/) API. + +### API + +The shell commands are processed by the SortingHat API, which is based on a three-layer architecture that promotes +modularization and decoupling. The first layer consists of *basic methods* that interact with the database and implement +CRUD operations such as additions, deletions or searches (e.g., find organization). The second layer contains *composed +method*s, which leverage on the basic methods. This is the case of move identity, which retrieves two identities and +update their information. Finally, the top layer includes *complex methods* that have a one-to-one correspondence with +the shell commands. They rely on composed methods. + +## SortingHat in action + +This section describes how to install and use SortingHat, highlighting its main features. + +### Installation + +SortingHat has been developed in Python and tested mainly on GNU/Linux platforms. There are several ways for installing +SortingHat on your system (from pip, Docker or source code) which are detailed in the SortingHat repository. + +### Use + +Once installed, SortingHat can be used as a stand-alone program or a Python library + +#### Stand alone program + +Using SortingHat as stand-alone program requires only some basic knowledge of GNU/Linux shell commands. The listing below shows how easy it is to add, list +and merge identities. As can be seen, the command add accepts name, email, username and data source (e.g., git) of +the identity to be inserted. The command show prints profile data and the list of identities linked to the unique identifier +passed as input. Finally, the command merge allows to perform a manual merge on two identities using as input their +unique identifiers, while the command unify automatically merges unique identities on a set of optional data sources +using heuristics (e.g., perfect matches on usernames). + +```bash +# Adding identities +$ sortinghat add -- name " Harry Potter " -- email " hpotter@hogwarts . edu " -- source git +New identity 0 ca .. c1 added + +$ sortinghat add -- name " Harry Potter " -- username " harryp " -- source github +New identity 11 c .. ab added + +$ sortinghat add -- name " H . Potter " -- username " harryp " -- source slack +New identity 23 d .. r2 added + +# Listing identities +$ sortinghat show 0 ca .. c1 +unique identity 0 ca .. c1 +Profile : ... +Identities : +0 ca .. c1 Harry Potter hpotter@hogwarts . edu - git + +# Merge identities +$ sortinghat merge 0 ca .. c1 11 c .. ab +Unique identity 0 ca .. c1 merged on 11 c .. ab + +# Unify identities +$ sortinghat unify -- sources github slack -- matcher username +Total unique identities processed 2 +``` + +#### Python library + +Including SortingHat functionalities to Python scripts and applications is easy, since the user only needs some basic knowledge of Python. Currently, +SortingHat is integrated in ELK. The listing below shows how information related to identities is uploaded to SortingHat via ELK. The method +add identity uses the SortingHat API calls to add identities, organizations, enrollments. + +```python +from sortinghat import api + +def add_identity (cls, db, identity, backend): + + uuid = api.add_identity (db, backend, identity['email'], identity['name'], identity['username']) + profile = { + "name" : "..", + "email" : identity['email'] + } + + api.add_organization(db, identity['company']) + api.add_enrollment(db, uuid, identity['company'], ...) + + return uuid +``` + +## Example +TODO + +## HatStall + +SortingHat functionalities are also available via HatStall, a Web application written in Django, a popular framework for Web development in Python. HatStall +provides an intuitive graphical interface to perform operations over a SortingHat database. It is fully open source, available as a [Docker image](https://hub.docker.com/r/grimoirelab/hatstall/), and +can be easily plugged to GrimoireLab. After starting HatStall, the user sets up the parameters (e.g., username, host) to connect to an existing +SortingHat database, and then navigates and modifies the identities data through the application. The figure below shows the information of a CHAOSS project member. The page contains his profile data, enrollments and identities plus widgets +to modify them. + +![](../assets/hatstall-profile.png) + +*CHAOSS member data shown through HatStall, it includes his profile, enrollments and identities.* \ No newline at end of file diff --git a/assets/chaoss_dashboard.png b/assets/chaoss_dashboard.png new file mode 100644 index 00000000..d5f6e231 Binary files /dev/null and b/assets/chaoss_dashboard.png differ diff --git a/assets/graal.png b/assets/graal.png new file mode 100644 index 00000000..0fcaad6d Binary files /dev/null and b/assets/graal.png differ diff --git a/assets/graal.svg b/assets/graal.svg new file mode 100644 index 00000000..a552d74a --- /dev/null +++ b/assets/graal.svg @@ -0,0 +1,849 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + mirror + + Target repo + Local repo + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + worktree + + Working tree + fetch + + JSON docs + + filter + checkouts + analyze + post + + + + + + + Python libs + + + + System calls + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Perceval + + + + + + + + + + + + + + + + + + + + + + + JSON docs + + + + diff --git a/assets/grimoirelab-all-details.png b/assets/grimoirelab-all-details.png new file mode 100644 index 00000000..f366ac2d Binary files /dev/null and b/assets/grimoirelab-all-details.png differ diff --git a/assets/grimoirelab-all-details.svg b/assets/grimoirelab-all-details.svg new file mode 100644 index 00000000..d34b65ac --- /dev/null +++ b/assets/grimoirelab-all-details.svg @@ -0,0 +1,3095 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Dashboards + + + + + + + + Reports + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Arthur + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ... + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data sources + GrimoireELK + + + + + + + SortingHat + + + + + + + + + + Raw data + + + + + + + + + + + + + Kibiter + + + + + + + + + + + Enriched data + + + + + + + + + + identities management + + + + + data retrieval + + + data storage + + analytics + + Mordred + + orchestration + + + + + Perceval + + + + + + + + + + + + + + + + + + + + + + Manuscripts + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Sigils + + + + Graal + + + + + + + + HatStall + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Cereslib + + Kidash + + + + diff --git a/assets/grimoirelab-all.png b/assets/grimoirelab-all.png new file mode 100644 index 00000000..5aef66f3 Binary files /dev/null and b/assets/grimoirelab-all.png differ diff --git a/assets/grimoirelab-all.svg b/assets/grimoirelab-all.svg new file mode 100644 index 00000000..2c58e0f7 --- /dev/null +++ b/assets/grimoirelab-all.svg @@ -0,0 +1,1967 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ... + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Data sources + + + Data retrieval + + + + + + Identities + management + + + + + + + + + + + + Analytics + + + + + Orchestration + + + Data storage + + diff --git a/assets/grimoirelab-analytics.png b/assets/grimoirelab-analytics.png new file mode 100644 index 00000000..47645d93 Binary files /dev/null and b/assets/grimoirelab-analytics.png differ diff --git a/assets/grimoirelab-analytics.svg b/assets/grimoirelab-analytics.svg new file mode 100644 index 00000000..797a7c07 --- /dev/null +++ b/assets/grimoirelab-analytics.svg @@ -0,0 +1,2051 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Enriched data + + + Browser + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Document + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Kibiter + + + + + + Manuscripts + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Sigils + + + Kidash + + + + diff --git a/assets/grimoirelab-data-storage.png b/assets/grimoirelab-data-storage.png new file mode 100644 index 00000000..00b55746 Binary files /dev/null and b/assets/grimoirelab-data-storage.png differ diff --git a/assets/grimoirelab-data-storage.svg b/assets/grimoirelab-data-storage.svg new file mode 100644 index 00000000..c5bb52b4 --- /dev/null +++ b/assets/grimoirelab-data-storage.svg @@ -0,0 +1,456 @@ + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + JSON documents + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Feeder + Raw data + + + + + Enricher + + + Enriched data + + diff --git a/assets/grimoirelab-logo.png b/assets/grimoirelab-logo.png new file mode 100644 index 00000000..f757bfef Binary files /dev/null and b/assets/grimoirelab-logo.png differ diff --git a/assets/grimoirelab-logo.svg b/assets/grimoirelab-logo.svg new file mode 100644 index 00000000..9ac1879b --- /dev/null +++ b/assets/grimoirelab-logo.svg @@ -0,0 +1,97 @@ + + + + + + + + + + image/svg+xml + + + + + + + + + + + + FREE, LIBRE OPEN SOURCE TOOLS FOR + SOFTWARE DEVELOPMENT ANALYTICS + + diff --git a/assets/grimoirelab_chaoss.png b/assets/grimoirelab_chaoss.png new file mode 100644 index 00000000..eebf0553 Binary files /dev/null and b/assets/grimoirelab_chaoss.png differ diff --git a/assets/hatstall-profile.png b/assets/hatstall-profile.png new file mode 100644 index 00000000..19bded48 Binary files /dev/null and b/assets/hatstall-profile.png differ diff --git a/assets/kingarthur-oneline.png b/assets/kingarthur-oneline.png new file mode 100644 index 00000000..fbf9dc86 Binary files /dev/null and b/assets/kingarthur-oneline.png differ diff --git a/assets/kingarthur-oneline.svg b/assets/kingarthur-oneline.svg new file mode 100644 index 00000000..1b5c3279 --- /dev/null +++ b/assets/kingarthur-oneline.svg @@ -0,0 +1,1177 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + Server + + + Tasks + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Update + Create + + Jobs + + + ... + + + + + + + + + + + + + + + + + + + + + + + + + + Scheduler + + Notification + + + + + + + + JSON docs + + + + + + + + + + + Workers + + + + queue + + + + + + + queue + + + + + + + queue + + + + + + Storage + Writers + + + + + + + + + + + diff --git a/assets/kingarthur.png b/assets/kingarthur.png new file mode 100644 index 00000000..f0f7fac5 Binary files /dev/null and b/assets/kingarthur.png differ diff --git a/assets/kingarthur.svg b/assets/kingarthur.svg new file mode 100644 index 00000000..4a36aaf3 --- /dev/null +++ b/assets/kingarthur.svg @@ -0,0 +1,1242 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + Arthur + + Server + Workers + + + Tasks + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Update + Create + + Jobs + + + ... + + + + + + + + + + + + + + + + + + + + + + + + + + Scheduler + + Notification + + + + + + + + JSON docs + + + + + + + + + + + + queue + + + + + + + queue + + + + + + + + queue + + + + + + Storage + + Writers + + + + + + + + diff --git a/assets/perceval-json.png b/assets/perceval-json.png new file mode 100644 index 00000000..c13438c1 Binary files /dev/null and b/assets/perceval-json.png differ diff --git a/assets/perceval-json.svg b/assets/perceval-json.svg new file mode 100644 index 00000000..652f93aa --- /dev/null +++ b/assets/perceval-json.svg @@ -0,0 +1,1130 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ... + + + + + + + + + + + + + Data sources + + CommandLine + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Backend + + + + + + + + + + + + + + + + + + + + + + + + Client + + + + + + + + + + + + + + + + + + + + + + + + + + + + Perceval + + + + JSON documents + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/assets/sortinghat-schema.png b/assets/sortinghat-schema.png new file mode 100644 index 00000000..b594db80 Binary files /dev/null and b/assets/sortinghat-schema.png differ diff --git a/assets/sortinghat-schema.svg b/assets/sortinghat-schema.svg new file mode 100644 index 00000000..69ffb7de --- /dev/null +++ b/assets/sortinghat-schema.svg @@ -0,0 +1,7271 @@ + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Domains_organizations + domain : Stringis_top_domain : Boolean + + + + + Matching_blacklist + excluded : String + + + + + Countries + code : Stringname : Stringalpha3 : String + + + + + Organizations + name: String + + + + + Identities + name : Stringemail : Stringusername : Stringsource: Stringlast_modified: Date + + + + + Enrollments + start : Dateend: Date + + + + + Profiles + name: Stringemail : Stringgender : Stringgender_acc : Intis_bot: Boolean + + + + + Uidentities + last_modified: Date + + + + + 0..* + domains + 1..1 + organization + + 0..* + enrollments + 1..1 + organization + 0..1 + countries + 0..* + profiles + + enrollments + 0..* + 1..1 + uuid + + 1..1 + profile + 1..1 + uuid + + + 1..1 + uuid + 1..* + identities + diff --git a/assets/sortinghat.png b/assets/sortinghat.png new file mode 100644 index 00000000..40fdf078 Binary files /dev/null and b/assets/sortinghat.png differ diff --git a/assets/sortinghat.svg b/assets/sortinghat.svg new file mode 100644 index 00000000..f002dda6 --- /dev/null +++ b/assets/sortinghat.svg @@ -0,0 +1,880 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + image/svg+xml + + + + + + + + + + + + + Database + + + + Commands + + + + + + + + + + + + + + Interactive + + + + + + + + + + + + + + + + Eclipse + + + + + + + + + + + + + + + Mozilla + + + + + + + + + + + + + + + Gitdm + + + Batch + + + API + + + + + Parsers + + + + + + + + + + + + + + + + SortingHat + + + +