diff --git a/README.md b/README.md index 6de261a..80f8566 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,66 @@ -# CodeReviewer ML Performance +# Code Review Automation with Language Models -![Static Badge](https://img.shields.io/badge/docs-available-orange?style=flat-square) +[![Static Badge](https://img.shields.io/badge/docs-available-orange?style=flat-square)](https://alexkovrigin.me/Code-Review-Automation-LM) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black) + +## Overview + +Code review is a crucial aspect of the software development process, ensuring that code changes are thoroughly examined +for quality, security, and adherence to coding standards. However, the code review process can be time-consuming, and +human reviewers may overlook certain issues. To address these challenges, we have developed a Code Review Automation +system powered by language models. + +Our system leverages state-of-the-art language models to generate code reviews automatically. These models are trained +on a vast corpus of code and can provide insightful feedback on code changes. By automating part of the code review +process, our system aims to: + +- Speed up the code review process. +- Identify common code issues and provide recommendations. +- Assist developers in producing higher-quality code. + +## Key Features + +### 1. Data Collection + +Our system collects code review data from popular GitHub repositories. This data includes code changes and associated +human-authored code reviews. By leveraging this data, our models learn to generate contextually relevant code reviews. + +### 2. Model Inference and Fine-Tuning + +We use pre-trained language models and fine-tune them on code review datasets. Fine-tuning allows the models to +specialize in generating code reviews, making them more effective in this task. + +Once the models are trained, they can generate code reviews for new code changes. These generated reviews can highlight +potential issues, suggest improvements, and provide feedback to developers. + +### 3. Evaluation Metrics + +We use the BLEU-4 score metric to assess the quality of generated code reviews. This metric measures the similarity +between model-generated reviews and target human reviews. While our models provide valuable assistance, they are +designed to complement human reviewers. + +## Getting Started + +To get started with our Code Review Automation system, follow these steps: + +1. Clone this repository to your local machine: + + ```bash + git clone https://github.com/waleko/Code-Review-Automation-LM.git + cd Code-Review-Automation-LM + ``` + +2. Set up the required dependencies and environment (see `requirements.txt`). + +3. Run the provided notebooks to explore data collection, model inference, and evaluation. + +4. Integrate the code review automation system into your development workflow. You can use our pre-trained models or + fine-tune them on your specific codebase for even better results. + +## License + +This project is licensed under the Apache 2.0 License - see the [LICENSE](LICENSE) file for details. + +## Contact + +For any questions or inquiries, please contact [inbox@alexkovrigin.me](mailto:inbox@alexkovrigin.me). diff --git a/_config.yml b/_config.yml index 97d277e..14b8004 100644 --- a/_config.yml +++ b/_config.yml @@ -1,7 +1,7 @@ # Book settings # Learn more at https://jupyterbook.org/customize/config.html -title: CodeReviewer ML Performance +title: Code Review Automation with Language Models author: Alexander Kovrigin copyright: "2023" @@ -21,9 +21,8 @@ bibtex_bibfiles: # Information about where the book exists on the web repository: - url: https://github.com/waleko/CodeReviewer-ML-Performance # Online location of your book - path_to_book: docs # Optional path to your book, relative to the repository root - branch: main # Which branch of the repository should be used when creating links (optional) + url: https://github.com/waleko/Code-Review-Automation-LM # Online location of your book + branch: gh-pages # Which branch of the repository should be used when creating links (optional) # Add GitHub buttons to your book # See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository diff --git a/docs/conclusion.md b/docs/conclusion.md index e69de29..9de81f5 100644 --- a/docs/conclusion.md +++ b/docs/conclusion.md @@ -0,0 +1,17 @@ +# Conclusion +In our exploration of code review data collection and model inference, we have gained valuable insights into the capabilities and limitations of language models in the context of code review. This journey has encompassed various notebooks, each focusing on a specific aspect of the process. Here, we summarize our key findings and the implications of our work: + +- Language models show promise in generating code reviews, but there is ample room for improvement in terms of review quality, context, and relevance. + +- Fine-tuning models on code review datasets is a valuable approach to enhance their performance, but further research is needed to optimize fine-tuning techniques. + +- While models can assist in code reviews, they should be viewed as complementary tools to human reviewers rather than replacements. Human expertise remains invaluable in the code review process. + +- Future work may involve exploring more advanced language models, experimenting with different fine-tuning strategies, and incorporating user feedback to refine predictions. + +In conclusion, our journey through code review data collection and model inference has provided valuable insights into the potential of language models in code review automation. While challenges remain, these models have the potential to augment the code review process, helping developers produce higher-quality code. As technology continues to advance, we anticipate exciting developments in this field and a continued focus on improving the effectiveness of code review automation. + +## Bibliography + +```{bibliography} +``` diff --git a/docs/intro.md b/docs/intro.md index 1e5919d..a0b6219 100644 --- a/docs/intro.md +++ b/docs/intro.md @@ -1,11 +1,51 @@ -# CodeReviewer ML Performance +# Code Review Automation with Language Models -This is a small sample book to give you a feel for how book content is -structured. -It shows off a few of the major file types, as well as some sample content. -It does not go in-depth into any particular topic - check out [the Jupyter Book documentation](https://jupyterbook.org) for more information. +## Introduction -Check out the content pages bundled with this sample book to see more. +In this series of Jupyter notebooks, we embark on a journey to collect code review data from GitHub repositories and +perform code review predictions using language models. Our primary goal is to explore the capabilities of different +models in generating code reviews and evaluate their performance. + +### Collecting Code Review Data + +In this initial notebook, we dive into the process of collecting code review data from GitHub repositories. We leverage +the PyGithub library to interact with the GitHub API, ensuring seamless data retrieval. + +We establish a function to collect code review data from a GitHub repository, allowing us to specify parameters such as +the number of comments to load, skipping author comments, and more. The collected data is structured into a Pandas +DataFrame for further analysis and processing. + +Three prominent repositories, namely `microsoft/vscode`, `JetBrains/kotlin`, and `transloadit/uppy`, are selected for +data collection due to their popularity and rich code review history. Additionally, we are going to use data from the +original CodeReviewer dataset `msg-test` that is provided by the authors of {cite}`li2022codereviewer`. + +### CodeReviewer Model Inference + +The second notebook focuses on generating code reviews using the `microsoft/codereviewer` model. We delve into the +tokenization and dataset preparation process, emphasizing the importance of specialized tokens. + +A custom `ReviewsDataset` class is introduced to facilitate data loading and transformation, making it compatible with +model inference. We load data from various sources, creating DataLoader instances for efficient model input. + +We explore the model inference process, employing both a HuggingFace pre-trained checkpoint and a fine-tuned +CodeReviewer model. The fine-tuning process details are outlined, showcasing parameters and resources used. Model +predictions are saved. + +### Predictions Evaluation + +In this notebook, we assess the quality of code review predictions generated by the models. Both HuggingFace pre-trained and +fine-tuned models are evaluated across different datasets, shedding light on their performance. + +Qualitative assessment is conducted to gain insights into how the models generate code reviews. We present samples of +code, along with predictions from both models, enabling a visual comparison with human reviews. This helps in +understanding the nuances of model-generated reviews. + +Lastly, we quantitatively evaluate the models' performance using BLEU-4 scores. We calculate scores for each dataset, +providing a comprehensive overview of how well the models align with human reviews. This quantitative analysis helps in +drawing conclusions about the effectiveness of the models in code review prediction. + +Throughout this journey, we aim to explore the capabilities and limitations of language models in the context of code +review, shedding light on their potential applications and areas for improvement. ## Table of Contents