Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some GitHub commit message guidance #110

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

russellb
Copy link
Member

@russellb russellb commented Jul 3, 2024

A common theme in PR reviews has been discussing the level of detail
that should be included in commit messages, as well as how to craft a
series of commits in a PR that helps guide a reviewer through the
changes made to accomplish the overall goal of the PR.

This adds some text we can point to on this topic that provides some
helpful explanations and examples of commit message and commit series
best practices.

Signed-off-by: Russell Bryant [email protected]


- <https://github.com/instructlab/instructlab-bot/pull/125/commits>
- <https://github.com/instructlab/instructlab/pull/994/commits>
- <https://github.com/instructlab/instructlab/pull/951/commits>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure there are even better examples out there. I'm happy to change this over time to represent some of the best examples people can find in our repos. These are a few I had dug up some weeks ago.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies if this is a personal preference nit. Noting that many/most PRs are small in scope and (I believe) should usually only have 1 commit. I generally think of a commit as a unit of work that you might choose to un-commit. And breaking down work into revertable logic is a great practice. But stating multiple commits is the general case might lead to a lot of commits for work that doesn't need to be broken up. My ask would be to adjust the wording to not promote multiple commits quite so strongly but still indicate how to use them effectively. If there is a fundamental disagreement on approach, I'm happy to be a chameleon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're certainly not alone in this preference, and I think there's a bit of a difference between a camp of people who developed practices before github existed, and another camp who developed practices around github and maybe are less reliant on e.g. git log

Here's a concrete example of why I'm firmly in the "more is more" (within reason) on splitting up commits: https://github.com/instructlab/sdg/pull/74/commits

Totally trivial PR, but 4 distinctly logical changes, and being able to look at those changes (and their commit message) individually is actually really valuable - e.g. separating the reordering of the parameters from the deleting unused parameters makes it trivial to review. First you make sure the reordering didn't change anything, then you confirm that the 4 parameters being dropped aren't in use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few thoughts, we might consider adding:

*Splitting commits is crucial because it provides "ease of revertability." If a specific commit causes an issue, reverting a single logical unit is much easier and less disruptive than reverting an entire feature.

  • Encouraging multiple PRs for large changes: contributors should be encouraged to split their work into multiple PRs if the total lines changed across commits exceed 1,000 lines. This rule of thumb helps keep changes manageable and reviews more focused.

  • The approach to splitting commits and PRs can depend on the review's velocity. High velocity in review and addressing feedback leads to rapid merges. Small PRs can be reviewed and merged quickly, benefiting both contributors, who see their changes integrated faster, and maintainers, who avoid reviewing massive changes. I wish the various projects could reach that state, we are not there yet.

This strategy helps maintain the project's health and efficiency by ensuring changes are easier to manage, review, and revert if necessary.

PRs should be formatted as a series of logical commits that implement the
overall change proposed in the PR. Each commit should represent a logical step
in the progression of the change. The commit message for each should clearly
explain that step of the progression.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to be explicit here about squashing fixes and cleanups into those logical commits?

I look for high-level context about "what" a change is to frame my review, as well as some reason "why" the change is needed to understand prioritization and benefit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I have some older text at the bottom of this doc that could be moved up here as a start. I already moved up some of it. Here's what could be moved:

It is common that a PR author may need to do a final rebase to clean up their proposed commit series before a PR can be merged. It is also fine for a project maintainer to perform this step when the changes necessary are straight forward enough to do so. This includes doing a final rebase on main if necessary. The PR itself should NOT include any merge commits of main back into the developer's branch. We expect the proposed commit series to be a clean set of commits against main without conflicts or merge commit history. We only use a merge commit to record the PR's inclusion into main.

into crafting high-quality commit messages. Here is a script you can use to generate
a summary of the commits in your PR. It starts with a list of the commits one
line at a time and then follows that with the full commit messages. The format works
for pasting directly into GitHub.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GitHub CLI tools are also good for this. hub pull-request starts an editor with a template for creating the PR message, including all of the commit messages for commits in the branch. I'm sure the more modern gh does something similar, and I'd be surprised if there wasn't a VSCode somewhere that did something like it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually use gh pr create myself. When it gets to the PR description part, it launches my editor (vim) and I run :read!~/bin/ghprdesc.sh which is the script from above.

If it's just one commit, it already does what I want by default, which is include the full commit message as the PR title and description.

That reminds me of another point to make -- for a PR that is a single commit, the commit message should be equal to the PR description in most cases. Otherwise, you're probably leaving out info that should be in the commit message, as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the PR description contained a hand crafted description of the overall fix. This way the user pushing the PR has to think about the overall fix or feature being pushed. It provides a good overview to the reader of the PR afterwards.

It should contain breakdown of the different parts if helpful or can be read in the commits.

A common theme in PR reviews has been discussing the level of detail
that should be included in commit messages, as well as how to craft a
series of commits in a PR that helps guide a reviewer through the
changes made to accomplish the overall goal of the PR.

This adds some text we can point to on this topic that provides some
helpful explanations and examples of commit message and commit series
best practices.

Signed-off-by: Russell Bryant <[email protected]>
@markmc
Copy link
Contributor

markmc commented Jul 3, 2024

lgtm, one other resource (well, quote really) often referenced is https://who-t.blogspot.com/2009/12/on-commit-messages.html

Any software project is a collaborative project. [..] a commit message shows whether a developer is a good collaborator.

Copy link
Contributor

@leseb leseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I'm glad to see this doc improvement, thank you! One small thing we could add is to pay attention to the PR template (if it exists) as it provides guidance to contributors too.

```sh
#!/bin/bash

git log --reverse --oneline origin/main..HEAD
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the script is useful, it highly depends on how the remote is set up. I'd add a note about making sure the remote points to instructlab/instructlab and not your fork.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Is origin here the main repo (as it is for me) or your fork (which I call fork). :-)


- <https://github.com/instructlab/instructlab-bot/pull/125/commits>
- <https://github.com/instructlab/instructlab/pull/994/commits>
- <https://github.com/instructlab/instructlab/pull/951/commits>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few thoughts, we might consider adding:

*Splitting commits is crucial because it provides "ease of revertability." If a specific commit causes an issue, reverting a single logical unit is much easier and less disruptive than reverting an entire feature.

  • Encouraging multiple PRs for large changes: contributors should be encouraged to split their work into multiple PRs if the total lines changed across commits exceed 1,000 lines. This rule of thumb helps keep changes manageable and reviews more focused.

  • The approach to splitting commits and PRs can depend on the review's velocity. High velocity in review and addressing feedback leads to rapid merges. Small PRs can be reviewed and merged quickly, benefiting both contributors, who see their changes integrated faster, and maintainers, who avoid reviewing massive changes. I wish the various projects could reach that state, we are not there yet.

This strategy helps maintain the project's health and efficiency by ensuring changes are easier to manage, review, and revert if necessary.

Copy link
Member

@hickeyma hickeyma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me, thanks @russellb for the updates.

One thing wrt PR description I would prefer a hand crafted description rather than retrieval of the commit messages. See inline comment.

into crafting high-quality commit messages. Here is a script you can use to generate
a summary of the commits in your PR. It starts with a list of the commits one
line at a time and then follows that with the full commit messages. The format works
for pasting directly into GitHub.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if the PR description contained a hand crafted description of the overall fix. This way the user pushing the PR has to think about the overall fix or feature being pushed. It provides a good overview to the reader of the PR afterwards.

It should contain breakdown of the different parts if helpful or can be read in the commits.

- <https://docs.kernel.org/process/submitting-patches.html#separate-your-changes>
- <https://github.com/kubernetes/community/blob/master/contributors/guide/pull-requests.md#smaller-is-better-small-commits-small-pull-requests>

While not a requirement right now, we may want to consider following a spec like
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can also mention 50/72 rule, since Conventional Commits don't seem to cover it. E.g. this: https://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html

@nathan-weinberg
Copy link
Member

@russellb what is going on with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants