Skip to content

Commit

Permalink
better version
Browse files Browse the repository at this point in the history
  • Loading branch information
ThieryMichel committed Dec 16, 2024
1 parent 2ffb59f commit 3f20cb3
Showing 1 changed file with 162 additions and 58 deletions.
220 changes: 162 additions & 58 deletions article/creating-a-blog-article-reviewer-with-ai.mdx
Original file line number Diff line number Diff line change
@@ -1,106 +1,178 @@
---
layout: post
title: "Creating a reviewer for article using OpenAI"
excerpt: ""
thumbnail_image: ../../static/images/blog/...
cover_image: ../../static/images/blog/...
title: "Who Needs Humans Code Reviews When We Have AI?"
excerpt: "We automated proofreading of our blog articles with a GitHub action leveraging an LLM. It even reviewed this very article!"
cover_image: "./images/large.webp"
thumbnail_image: "./images/small.webp"
authors:
- thiery
tags:
- ai
- bot
- github
---

At Marmelab, we write many blog articles that require reviewing and improvement.
AI models like ChatGPT are great for text improvement. What if we created an AI bot to review articles?
We create blog articles by opening pull requests on the blog repository. What if we used a GitHub action to review our articles using OpenAI?
In our quest to master AI and automation, we have been experimenting with pull requests reviews. It turns out this very blog post was written in a code editor and submitted as a pull request. How hard would it be to have an AI review the pull request?

The first step is to call OpenAI with the article content and the appropriate prompt. Integration with GitHub would be the next step.
We built a bot leveraging an LLM from OpenAI that reviews the pull requests on the blog repository. The bot provides comments and suggestions to improve the article. The bot is integrated with GitHub actions, so it runs automatically on every pull request. It even reviewed this very article!

Calling the OpenAI API is simple with the `openai` package. Just follow the README instructions.
Finding the right prompt was a significant challenge.
![An AI comment](images/ai-comment.png)

## The quest for the good prompt
## The Big Picture

I will not run you through all the different versions of the prompt, instead here is the lesson I learned:
I built a [GitHub action](https://docs.github.com/en/actions) that triggers on every pull request. The action retrieves the diff of the pull request, sends it to [OpenAI's chat completion API](https://platform.openai.com/docs/guides/text-generation) with custom instructions, and uses the result to create a review on the pull request with the comments and suggestions.

### The AI loves to explain what it does
Calling the OpenAI API is pretty straightforward, thanks to the [openai package](https://github.com/openai/openai-node). The difficulties are elsewhere:

The AI being created as an agent loves to talk, but I just wanted the result. So the instruction `Do not explain what you are doing` is invaluable.
- How to properly prompt the LLM to review an article?
- How to use the LLM response to create a review with the GitHub API?

### The AI can generate JSON rather reliably if you ask correctly
![The Quest for the Right Prompt](./images/theQuestForTheRightPrompt.webp)

You can have the AI format its response in JSON reliably, by providing a template.
## The Quest for the Right Prompt

I went through numerous iterations of the prompt to get the AI to provide the right feedback. It really is an iterative process, as I discovered many ways the AI could fail by testing it on various inputs.

Before writing a prompt, I recommend gathering a few examples of the input you want to process, focusing on diversity and edge cases. I also recommend running the prompt many times on the same content to see how the AI behaves.

Here are the lesson I learned.

### The AI Loves To Explain What It Does

The OpenAI's LLMs systematically justify their response. I only wanted the result. I had to add instructions to prevent the AI from explaining itself.

```
Do not explain what you are doing.
```

### The AI Can Generate JSON Rather Reliably If You Ask Correctly

If the LLM response has to be read by a program, as in my case, it's better to have it in a [structured format](https://platform.openai.com/docs/guides/structured-outputs). You can make the AI use a specified JSON structure by providing a template of the response.

```
Respond with the following JSON structure:
```json
[
{
"comment": "<comment targeting one line>",
"lineNumber": <line_number>,
"suggestion": "<The text to replace the existing line with. Leave empty, when no suggestion is applicable, must be related to the comment>",
"originalLine": "<The content of the line the comment apply to>"
}
]
```

Note it tends to wrap the result in a `\`\`\`json` tag, even when you tell it not to.
Note the AI tends to wrap the result in a ` ```json ` tag, even when you tell it not to.

Additionally with models `4o-mini` and above, when you set the `response_format.type` to `json_schema`, you can provide the json_schema for the answer. For example:

```js
{
response_format: {
type: "json_schema",
json_schema: {
name: "review-comments",
schema: {
type: "object",
properties: {
comments: {
type: "array",
items: {
type: "object",
properties: {
comment: { type: "string" },
suggestion: { type: "string" },
originalLine: { type: "string" },
lineNumber: { type: "number" },
},
},
},
},
},
},
}
}
```

### The AI Does Not Know How To Count

To generate comments on a specific line, I needed to provide the line number. Initially, I passed the article directly and asked the LLM to include a line number for each comment. It did generate line numbers, but they were wrong. They were often larger than the size of the text.

![comment with wrong line number](images/wrong-line-number.png)

Interestingly, the AI could quote the correct line when asked to.

### The AI does not know how to count
So I modified the prompt to add line numbers at the start of each line. This helped improve the LLM accuracy when citing line numbers, although it still got it wrong occasionally.

At first, I passed the article directly and got back an array of comments. But the line numbers were wrong. They were often bigger than the size of the text.
The funny thing is that it was able to quote the line it targeted when asked to.
In the end, I added the line number at the start of every line. As is the case in the GitHub diff.
But even like this it still gets it wrong sometimes, albeit way more rarely.
### The AI Result Is Random By Default

### The AI result is random
It should be obvious, but I was still surprised by how the same prompt could yield vastly different results. Lowering the [temperature](https://platform.openai.com/docs/api-reference/chat/create#chat-create-temperature) helped to get more consistent results.

It should be obvious, but I was still surprised by how the same prompt could yield vastly different results.
### The AI Sometime Fails

### The AI will want to do what you ask, even when there is nothing to do.
Given the previous two points, it was necessary to check the AI output to make sure it could be transformed into a proper diff.

When you ask the AI to return an array of comments, it will do so even if there is nothing to improve. I had to specifically instruct it to return nothing if there were no suggestions.
I instructed the AI to include the original line along with the comment. With the additional info I was able to check the comment position and fix it if needed.

The final prompt (for now)
But not always. In the case where I was not able to locate the target line properly, I discarded the comment altogether.

### The AI Always Tries To Do Something

The code review bot runs on every push. If the first draft often needs improvement, the final draft is usually good and doesn't need any changes.

However, when asked to suggest improvements, the AI will always provide some, even if there’s nothing to fix. I had to explicitly tell it to return an empty array if no changes were needed.

```
Your task is to review pull requests on a technical blog. Instructions:
Provide comments and suggestions ONLY if there is something to improve or fix,
otherwise return an empty array.
```

### The Final Prompt

Given all the lessons learned, here is the prompt I ended up with:

```
Your task is to review pull requests on a technical blog.
Instructions:
- Do not explain what you're doing.
- Provide the response in following JSON format, And return only the json:
- Provide the response in following JSON format, and return only the json:
[
{
"comment": "<comment targeting one line>",
"lineNumber": <line_number>,
"suggestion": "<The text to replace the existing line with. Leave empty, when no suggestion is applicable, must be related to the comment>",
"originalLine": "<The content of the line the comment apply to>"
}
]
- returned result must only contains valid json
- Propose change to text and code.
- Propose changes to text and code.
- Fix typo, grammar and spelling
- ensure short sentence
- ensure one idea per sentence
- simplify complex sentence.
- simplify complex sentences
- No more than one comment per line
- One comment can address several issues
- Provide comments and suggestions ONLY if there is something to improve or fix, otherwise return an empty array.
Git diff of the article to review:
\`\`\`diff
${diff}
\`\`\`
```

## Integrating with github API
## Integrating With The GitHub API

To integrate with the GitHub API in a GitHub action I used [@octokit/rest](https://github.com/octokit/rest.js)
For what I wanted I needed to:
To integrate with the GitHub API in a GitHub action, I used [@octokit/rest](https://github.com/octokit/rest.js).

- retrieve the current pull request details
- retrieve the diff
- create the review
This was a three-step process:

### Retrieving the current pull request details
1. Retrieve the current pull request details
2. Retrieve the diff
3. Create the review

To retrieve the pull request details in a GitHub actions context, you must first execute the `actions/checkout@v3` action.
Then using the GITHUB_EVENT_PATH environment variable, you can read the repository information with
### Retrieving the Current Pull Request Details

In a GitHub actions context, you must first execute the `actions/checkout@v3` action to checkout the code. Then, using the `GITHUB_EVENT_PATH` environment variable, you can read the repository information:

```ts
const { repository, number } = JSON.parse(
Expand All @@ -113,9 +185,9 @@ return {
};
```

### Retrieving the diff
### Retrieving the Diff

To retrieve the diff you can use `octokit.pulls.get`
To retrieve the diff for the current pull request, use `octokit.pulls.get`:

```ts
const response = await octokit.pulls.get({
Expand All @@ -126,9 +198,9 @@ const response = await octokit.pulls.get({
});
```

### creating the review
### Creating the Review

Finally, after you have retrieved the comments using OpenAi you can create a code review with:
After retrieving the LLM comments, you can create a code review:

```ts
await octokit.pulls.createReview({
Expand All @@ -140,18 +212,50 @@ await octokit.pulls.createReview({
});
```

A GitHub comment is composed of a line, path, and body. I placed the comment and the suggestion in the body
A GitHub comment consists of a line, path, and body. I placed the comment and the suggestion in the body like this:

```ts
body: `${item.comment}
\`\`\`suggestion
${item.suggestion}
\`\`\``;
```
body: `The phrase 'for what I wanted I needed to' is awkward. Rephrase for clarity.
```suggestion
I needed to do the following:
```
`;

Be careful to remove any indentation as it could change the suggestion into a block quote.

Be careful not to keep any indentation as it could change the suggestion to a block quote.
## The Result

The bot is now running on the Marmelab blog repository. It reviews every pull request and provides comments and suggestions to improve the article. It detects basic grammar and spelling mistakes, and also provides simple suggestions to improve the text.

However, it's far from perfect. It never stop offering suggestions even on text it suggested itself. It often suggests comments that simplify too much losing the full meaning.

It lacks additional context to mimic style from other articles, and with its limited memory it cannot currently load sufficient example in the code to achieve that
We also need to tweak the prompt to keep a consistent style between articles. The AI is very good at mimicking the style of the input, so we need to include the text of past articles in the prompt.

## Conclusion

Here is the repository of the GitHub actions: [AI Article Reviewer](https://github.com/ThieryMichel/proof-reader-ai)
The action has been published (add a link once published) feel free to try it by following the README.
Here is the repository of the GitHub actions: [AI Article Reviewer](https://github.com/ThieryMichel/proof-reader-ai). The action has been published [on github actions marketplace](https://github.com/marketplace/actions/proof-reader-ai-action) feel free to try it by adding the following workflow:

```yaml
# .github/workflows/proof-reader.yml
name: AI Code Reviewer

on:
pull_request:
types:
- opened
- synchronize
permissions: write-all
jobs:
review:
runs-on: ubuntu-latest
steps:
- name: Checkout Repo
uses: actions/checkout@v3

- name: Proof Reader AI Action
uses: marmelab/proof-reader-ai@main
with:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # The GITHUB_TOKEN is there by default so you just need to keep it like it is and not necessarily need to add it as secret as it will throw an error. [More Details](https://docs.github.com/en/actions/security-guides/automatic-token-authentication#about-the-github_token-secret)
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
OPENAI_API_MODEL: "gpt-4o-mini" # Optional: defaults to "gpt-4o-mini" do not support model prior to 4
```

0 comments on commit 3f20cb3

Please sign in to comment.