deploy: f1d1583

waleko · Sep 17, 2023 · 83a40ea · 83a40ea
1 parent ff0f1a7
commit 83a40ea
Show file tree

Hide file tree

Showing 5 changed files with 44 additions and 77 deletions.
diff --git a/README.html b/README.html
@@ -325,7 +325,7 @@ <h2> Contents </h2>
             <nav aria-label="Page">
                 <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#overview">Overview</a></li>
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#key-features">Key Features</a><ul class="nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#contents">Contents</a><ul class="nav section-nav flex-column">
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#data-collection">1. Data Collection</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#model-inference-and-fine-tuning">2. Model Inference and Fine-Tuning</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#evaluation-metrics">3. Evaluation Metrics</a></li>
@@ -347,58 +347,53 @@ <h2> Contents </h2>
 
   <section class="tex2jax_ignore mathjax_ignore" id="code-review-automation-with-language-models">
 <h1>Code Review Automation with Language Models<a class="headerlink" href="#code-review-automation-with-language-models" title="Permalink to this heading">#</a></h1>
-<p><a class="reference external" href="https://alexkovrigin.me/Code-Review-Automation-LM"><img alt="Static Badge" src="https://img.shields.io/badge/docs-available-orange?style=flat-square" /></a>
-<a class="reference external" href="https://github.com/psf/black"><img alt="Code style: black" src="https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square" /></a></p>
+<p><a class="reference external" href="https://alexkovrigin.me/Code-Review-Automation-LM"><img alt="Static Badge" src="https://img.shields.io/badge/jupyter-book-orange?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABwAAAAZCAMAAAAVHr4VAAAAXVBMVEX////v7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/v7+/zdybv7+/zdybv7+/v7+/zdybv7+/zdybv7+/zdyaSmqV2AAAAHXRSTlMAEBAgIDAwQEBQUGBgcHCAgJCQoLCwwMDQ4ODw8MDkUIUAAADJSURBVHjaddAFkgNBCAXQP+7uAvc/5tLFVseYF8crUB0560r/5gwvjYYm8gq8QJoyIJNwlnUH0WEnART6YSezV6c5tjOTaoKdfGXtnclFlEBEXVd8JzG4pa/LDql9Jff/ZCC/h2zSqF5bzf4vqkgNwEzeClUd8uMadLE6OnhBFsES5niQh2BOYUqZsfGdmrmbN+TMvPROHUOkde8sEs6Bnr0tDDf2Roj6fmVfubuGyttejCeLc+xFm+NLuLnJeFAyl3gS932MF/wBoukfUcwI05kAAAAASUVORK5CYII=&amp;style=for-the-badge" /></a></p>
 <section id="overview">
 <h2>Overview<a class="headerlink" href="#overview" title="Permalink to this heading">#</a></h2>
 <p>Code review is a crucial aspect of the software development process, ensuring that code changes are thoroughly examined
 for quality, security, and adherence to coding standards. However, the code review process can be time-consuming, and
-human reviewers may overlook certain issues. To address these challenges, we have developed a Code Review Automation
-system powered by language models.</p>
-<p>Our system leverages state-of-the-art language models to generate code reviews automatically. These models are trained
-on a vast corpus of code and can provide insightful feedback on code changes. By automating part of the code review
-process, our system aims to:</p>
-<ul class="simple">
-<li><p>Speed up the code review process.</p></li>
-<li><p>Identify common code issues and provide recommendations.</p></li>
-<li><p>Assist developers in producing higher-quality code.</p></li>
-</ul>
+human reviewers may overlook certain issues.</p>
+<p>In this series of Jupyter notebooks, we embark on a journey to collect code review data from GitHub repositories and
+perform code review predictions using a prominent language model: <a class="reference external" href="https://arxiv.org/abs/2203.09095">CodeReviewer</a> from
+Microsoft Research. Our primary goal is to explore the capabilities of this model in generating code reviews and
+evaluate its performance.</p>
 </section>
-<section id="key-features">
-<h2>Key Features<a class="headerlink" href="#key-features" title="Permalink to this heading">#</a></h2>
+<section id="contents">
+<h2>Contents<a class="headerlink" href="#contents" title="Permalink to this heading">#</a></h2>
 <section id="data-collection">
 <h3>1. Data Collection<a class="headerlink" href="#data-collection" title="Permalink to this heading">#</a></h3>
-<p>Our system collects code review data from popular GitHub repositories. This data includes code changes and associated
-human-authored code reviews. By leveraging this data, our models learn to generate contextually relevant code reviews.</p>
+<p>First, we collect the code review data from popular GitHub repositories. This data includes code changes and associated
+human-authored code reviews. By leveraging this data, the model learns to generate contextually relevant code reviews.</p>
 </section>
 <section id="model-inference-and-fine-tuning">
 <h3>2. Model Inference and Fine-Tuning<a class="headerlink" href="#model-inference-and-fine-tuning" title="Permalink to this heading">#</a></h3>
-<p>We use pre-trained language models and fine-tune them on code review datasets. Fine-tuning allows the models to
+<p>We take the pre-trained language checkpoint and fine-tune the model on code review datasets. Fine-tuning allows the models to
 specialize in generating code reviews, making them more effective in this task.</p>
 <p>Once the models are trained, they can generate code reviews for new code changes. These generated reviews can highlight
 potential issues, suggest improvements, and provide feedback to developers.</p>
 </section>
 <section id="evaluation-metrics">
 <h3>3. Evaluation Metrics<a class="headerlink" href="#evaluation-metrics" title="Permalink to this heading">#</a></h3>
 <p>We use the BLEU-4 score metric to assess the quality of generated code reviews. This metric measures the similarity
-between model-generated reviews and target human reviews. While our models provide valuable assistance, they are
-designed to complement human reviewers.</p>
+between model-generated reviews and target human reviews.</p>
 </section>
 </section>
 <section id="getting-started">
 <h2>Getting Started<a class="headerlink" href="#getting-started" title="Permalink to this heading">#</a></h2>
-<p>To get started with our Code Review Automation system, follow these steps:</p>
+<p>To get started with our work, follow these steps:</p>
 <ol class="arabic">
 <li><p>Clone this repository to your local machine:</p>
 <div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/waleko/Code-Review-Automation-LM.git
 <span class="nb">cd</span><span class="w"> </span>Code-Review-Automation-LM
 </pre></div>
 </div>
 </li>
-<li><p>Set up the required dependencies and environment (see <code class="docutils literal notranslate"><span class="pre">requirements.txt</span></code>).</p></li>
+<li><p>Set up the required dependencies from <code class="docutils literal notranslate"><span class="pre">requirements.txt</span></code>. E.g.: using <code class="docutils literal notranslate"><span class="pre">pip</span></code>:</p>
+<div class="highlight-bash notranslate"><div class="highlight"><pre><span></span>pip<span class="w"> </span>install<span class="w"> </span>-r<span class="w"> </span>requirements.txt
+</pre></div>
+</div>
+</li>
 <li><p>Run the provided notebooks to explore data collection, model inference, and evaluation.</p></li>
-<li><p>Integrate the code review automation system into your development workflow. You can use our pre-trained models or
-fine-tune them on your specific codebase for even better results.</p></li>
 </ol>
 </section>
 <section id="license">
@@ -457,7 +452,7 @@ <h2>Contact<a class="headerlink" href="#contact" title="Permalink to this headin
   <nav class="bd-toc-nav page-toc">
     <ul class="visible nav section-nav flex-column">
 <li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#overview">Overview</a></li>
-<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#key-features">Key Features</a><ul class="nav section-nav flex-column">
+<li class="toc-h2 nav-item toc-entry"><a class="reference internal nav-link" href="#contents">Contents</a><ul class="nav section-nav flex-column">
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#data-collection">1. Data Collection</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#model-inference-and-fine-tuning">2. Model Inference and Fine-Tuning</a></li>
 <li class="toc-h3 nav-item toc-entry"><a class="reference internal nav-link" href="#evaluation-metrics">3. Evaluation Metrics</a></li>

diff --git a/_sources/README.md b/_sources/README.md
@@ -1,33 +1,28 @@
 # Code Review Automation with Language Models
 
-[![Static Badge](https://img.shields.io/badge/docs-available-orange?style=flat-square)](https://alexkovrigin.me/Code-Review-Automation-LM)
-[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black)
+[![Static Badge](https://img.shields.io/badge/jupyter-book-orange?logo=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABwAAAAZCAMAAAAVHr4VAAAAXVBMVEX////v7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/zdybv7+/v7+/zdybv7+/zdybv7+/v7+/zdybv7+/zdybv7+/zdyaSmqV2AAAAHXRSTlMAEBAgIDAwQEBQUGBgcHCAgJCQoLCwwMDQ4ODw8MDkUIUAAADJSURBVHjaddAFkgNBCAXQP+7uAvc/5tLFVseYF8crUB0560r/5gwvjYYm8gq8QJoyIJNwlnUH0WEnART6YSezV6c5tjOTaoKdfGXtnclFlEBEXVd8JzG4pa/LDql9Jff/ZCC/h2zSqF5bzf4vqkgNwEzeClUd8uMadLE6OnhBFsES5niQh2BOYUqZsfGdmrmbN+TMvPROHUOkde8sEs6Bnr0tDDf2Roj6fmVfubuGyttejCeLc+xFm+NLuLnJeFAyl3gS932MF/wBoukfUcwI05kAAAAASUVORK5CYII=&style=for-the-badge)](https://alexkovrigin.me/Code-Review-Automation-LM)
 
 ## Overview
 
 Code review is a crucial aspect of the software development process, ensuring that code changes are thoroughly examined
 for quality, security, and adherence to coding standards. However, the code review process can be time-consuming, and
-human reviewers may overlook certain issues. To address these challenges, we have developed a Code Review Automation
-system powered by language models.
+human reviewers may overlook certain issues.
 
-Our system leverages state-of-the-art language models to generate code reviews automatically. These models are trained
-on a vast corpus of code and can provide insightful feedback on code changes. By automating part of the code review
-process, our system aims to:
+In this series of Jupyter notebooks, we embark on a journey to collect code review data from GitHub repositories and
+perform code review predictions using a prominent language model: [CodeReviewer](https://arxiv.org/abs/2203.09095) from
+Microsoft Research. Our primary goal is to explore the capabilities of this model in generating code reviews and
+evaluate its performance.
 
-- Speed up the code review process.
-- Identify common code issues and provide recommendations.
-- Assist developers in producing higher-quality code.
-
-## Key Features
+## Contents
 
 ### 1. Data Collection
 
-Our system collects code review data from popular GitHub repositories. This data includes code changes and associated
-human-authored code reviews. By leveraging this data, our models learn to generate contextually relevant code reviews.
+First, we collect the code review data from popular GitHub repositories. This data includes code changes and associated
+human-authored code reviews. By leveraging this data, the model learns to generate contextually relevant code reviews.
 
 ### 2. Model Inference and Fine-Tuning
 
-We use pre-trained language models and fine-tune them on code review datasets. Fine-tuning allows the models to
+We take the pre-trained language checkpoint and fine-tune the model on code review datasets. Fine-tuning allows the models to
 specialize in generating code reviews, making them more effective in this task.
 
 Once the models are trained, they can generate code reviews for new code changes. These generated reviews can highlight
@@ -36,12 +31,11 @@ potential issues, suggest improvements, and provide feedback to developers.
 ### 3. Evaluation Metrics
 
 We use the BLEU-4 score metric to assess the quality of generated code reviews. This metric measures the similarity
-between model-generated reviews and target human reviews. While our models provide valuable assistance, they are
-designed to complement human reviewers.
+between model-generated reviews and target human reviews.
 
 ## Getting Started
 
-To get started with our Code Review Automation system, follow these steps:
+To get started with our work, follow these steps:
 
 1. Clone this repository to your local machine:
 
@@ -50,12 +44,13 @@ To get started with our Code Review Automation system, follow these steps:
    cd Code-Review-Automation-LM
    ```
 
-2. Set up the required dependencies and environment (see `requirements.txt`).
+2. Set up the required dependencies from `requirements.txt`. E.g.: using `pip`:
 
-3. Run the provided notebooks to explore data collection, model inference, and evaluation.
+   ```bash
+   pip install -r requirements.txt
+   ```
 
-4. Integrate the code review automation system into your development workflow. You can use our pre-trained models or
-   fine-tune them on your specific codebase for even better results.
+3. Run the provided notebooks to explore data collection, model inference, and evaluation.
 
 ## License
 

diff --git a/_sources/docs/intro.md b/_sources/docs/intro.md
@@ -3,18 +3,13 @@
 ## Introduction
 
 In this series of Jupyter notebooks, we embark on a journey to collect code review data from GitHub repositories and
-perform code review predictions using language models. Our primary goal is to explore the capabilities of different
-models in generating code reviews and evaluate their performance.
+perform code review predictions using language models. Our primary goal is to explore the capabilities of the [CodeReviewer](https://arxiv.org/abs/2203.09095) model in generating code reviews and evaluate its performance.
 
 ### Collecting Code Review Data
 
 In this initial notebook, we dive into the process of collecting code review data from GitHub repositories. We leverage
 the PyGithub library to interact with the GitHub API, ensuring seamless data retrieval.
 
-We establish a function to collect code review data from a GitHub repository, allowing us to specify parameters such as
-the number of comments to load, skipping author comments, and more. The collected data is structured into a Pandas
-DataFrame for further analysis and processing.
-
 Three prominent repositories, namely `microsoft/vscode`, `JetBrains/kotlin`, and `transloadit/uppy`, are selected for
 data collection due to their popularity and rich code review history. Additionally, we are going to use data from the
 original CodeReviewer dataset `msg-test` that is provided by the authors of {cite}`li2022codereviewer`.
@@ -24,9 +19,6 @@ original CodeReviewer dataset `msg-test` that is provided by the authors of {cit
 The second notebook focuses on generating code reviews using the `microsoft/codereviewer` model. We delve into the
 tokenization and dataset preparation process, emphasizing the importance of specialized tokens.
 
-A custom `ReviewsDataset` class is introduced to facilitate data loading and transformation, making it compatible with
-model inference. We load data from various sources, creating DataLoader instances for efficient model input.
-
 We explore the model inference process, employing both a HuggingFace pre-trained checkpoint and a fine-tuned
 CodeReviewer model. The fine-tuning process details are outlined, showcasing parameters and resources used. Model
 predictions are saved.
@@ -37,15 +29,10 @@ In this notebook, we assess the quality of code review predictions generated by
 fine-tuned models are evaluated across different datasets, shedding light on their performance.
 
 Qualitative assessment is conducted to gain insights into how the models generate code reviews. We present samples of
-code, along with predictions from both models, enabling a visual comparison with human reviews. This helps in
-understanding the nuances of model-generated reviews.
+code, along with predictions from both models, enabling a visual comparison with human reviews.
 
 Lastly, we quantitatively evaluate the models' performance using BLEU-4 scores. We calculate scores for each dataset,
-providing a comprehensive overview of how well the models align with human reviews. This quantitative analysis helps in
-drawing conclusions about the effectiveness of the models in code review prediction.
-
-Throughout this journey, we aim to explore the capabilities and limitations of language models in the context of code
-review, shedding light on their potential applications and areas for improvement.
+providing a comprehensive overview of how well the models align with human reviews.
 
 ## Table of Contents