-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgeneral-discussion.tex
36 lines (16 loc) · 4.4 KB
/
general-discussion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
\Chapter{GENERAL DISCUSSION}\label{sec:general-discussion}
In this thesis, we explore models capable of expert identification and more specifically, maintainer indentification. Through a review of the existing literature on expertise models, we established that the current models were lacking exposure to the different activities undertaken by Linux developers and lacking exposure to the amount of time the developers have been involved with those activities. To answer this issue, we propose a new model aware of both time and the wide array of activities present in Linux development.
Additionally, we release two open-source projects based on the metrics acquired during the creation of the expertise model. This chapter provides a discussion regarding the two open-source tools: Srcmap and Email2git.
\section{Srcmap}
Srcmap, our visualization of the kernel and its authors, has a few constraints. The main constraint is the lack of a fluid user experience. The amount of data to process in the browser is too high to allow smooth browsing of the main tree. A way to address this issue would be to configure the interface to only download the required data as users browse the visualization. This way, the internet browser uses to display the tool would not have to save the entire dataset in memory and would only process the desired area.
\section{Email2git}
Email2git, our code reivew tracking system, has a few important limitations. The first limitation to consider is the missing mailing lists. Although our patch data source, patchwork.kernel.org, already tracks many mailing lists, some major mailings list like \texttt{net-dev} are not tracked. Although this is a minor issue, it reflects in the low number of commits matched in the \texttt{net} subdirectory.
We received a lot of valuable feedback from linux developers after our refereed talk the Open Source Summit North America. A developer mentioned the absence of the \textit{Patch 0} from our current implementation of Email2git. The Patch 0 is a summary of the changes submitted, often in multi-patch submissions. Another suggestion was to track \textit{linux-next}. This would allow developers to access discussion behind commits that have not been integrated in the main tree.
For the future of this project, we recommend running our own instance of Patchwork 2.0, which automatically track the Patch 0 of each patch. In addition to ansering the lack of Patch 0, running our own Patchwork would allow to have control on the set of tracked mailing lists. If we have access to old archives of the desired mailing lists, we could be able to create matching data dating to before 2009. We also recommend tracking the linux-next tree, as Email2git could ease the integration debugging process.
\section{Maintainer Recommendation Model}
The article in \autoref{sec:Theme3} describes a model used to recommend the maintainer(s) of a given subsystem. To create this recommender, we used thechniques described in previous work to create a modified expertise model.
All maintainers are experts, but not all experts are maintainers. Hence we use a different set of metrics chosen to represent the activities undertaken by maintainers on a daily basis, such as upstream committing and code reviews.
As described in \autoref{sec:discussion}, there are several threads to the validity of the model.
As a threat to \textit{external validity}, we describe that the data set should include data from a longer timespan and from more different projects, proprietary and open source. This would allow our model to be generalized to external contexts.
As a threat to \textit{construct validity}, we identify one of the metrics studies to be noisy. The data related to the \texttt{reviewed} activity is often incomplete due to the nature of the Linux code review process. This noise in the data represents a risk regarding one of the measure used in the empirical study.
Additionally, another threat to validity not mentioned in \autoref{sec:discussion} is that we validated the model's recommendations with the maintainers listed in the \texttt{MAINTAINERS} file as an oracle. Ideally, our model should be trained to detect or predict the best candidate as a \textit{replacement} to the current maintainer(s). To achieve this, we would have to create an oracle containing the developers that were selected as maintainers, and verify wheter our model is capable to recommend the developer at the moment of the selection.