Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When to split commits? #12

Open
javier-godoy opened this issue Dec 2, 2020 · 11 comments
Open

When to split commits? #12

javier-godoy opened this issue Dec 2, 2020 · 11 comments
Assignees
Labels
conventional-commits deferred The issue has been deferred for additional discussion

Comments

@javier-godoy
Copy link
Member

In #1 we discussed about multiple purpose commits. The agreement was that multiple purpose commits are not a good practice, based on the following comment from Conventional Commits:

What do I do if the commit conforms to more than one of the commit types?
Go back and make multiple commits whenever possible. Part of the benefit of Conventional Commits is its ability to drive us to make more organized commits and PRs.

In the context of Conventional Commits (which only defines fix and feat), the sentence above amounts to "do not squash two fixes in a single commit, and do not squash a fix and feature in a single commit", i.e. Atomic Commits. [1] [2]

On the other side, since we adopt additional types, the domain of "commits that conform to more than one of the commit types" is now broader. For instance:

  • Implement a feature (feat) and add unit tests (test) for that feature.
  • Implement a feature (feat) that requires adding/upgrading a external dependencies (build).
  • Add external dependencies for general purpose library (build) and implement a feature on top of that framework / general purpose library (feat).
  • Implement a fix that requires upgrading a dependency (build) and write
  • Implement a fix and realize that a external dependency was rather old, thus upgrade it just in case (build)
  • fix a missing null check and by the way replace all tabs to spaces in the whole file (style).
@javier-godoy javier-godoy added conventional-commits discuss The issue is scheduled for internal discussion labels Dec 2, 2020
@javier-godoy javier-godoy self-assigned this Dec 2, 2020
@scardanzan
Copy link
Member

I would suggest that any changes needed by a ticket should probably be in the same commit. In your list I would split the last two examples. You could also split all commits, but then you need to take care to commit in the appropriate order.

@mlopezFC
Copy link
Member

mlopezFC commented Dec 3, 2020

Yes, IMHO it's going to be almost impossible to avoid those situations so we should think in a workaround.
One thing that pops into my mind is to introduce the concept of dominant type: basically if there are many types that can be used because of the files that are being modified by the commit, and for some reason they cannot be split in several commits, we could sort them by some kind of priority like, for example:

  1. feat
  2. fix
  3. perf
  4. refactor
  5. test
  6. build
  7. ci
  8. chore
  9. docs
  10. revert
  11. style

And then use the highest priority type as a "summary". In that case if the commit will add a new feature and a unit test for it, then it's fine to use feat because it has a higher priority compared with test.
Other examples:

  • If we're improving performance and for doing that we need a newer version of a dependency, we should use perf instead of build (it's more important to note that the commit is a performance improvement, than just adding a dependency)
  • If we're modifying the build process and we also want to document that in the same commit, it's better to note that is a build change and not a change in the docs
  • If while we're adding more documentation we're adding some minor style corrections noting that we're documenting is more relevant
    What do you think?

@ngonzalezpazFC
Copy link

ngonzalezpazFC commented Dec 3, 2020

I think that is a solution that solves what @scardanzan proposed, but maybe it has the drawback of not respecting the atomic commits proposal.
I give an example of what we were doing with @javier-godoy while rewriting the commit history of the addons: If I had a commit that changed the pom, and those changes belong to two different types of the ones we defined, e.g. build and ci, I splitted the commit in two, one for each type, and both commits only changed de pom. This has the disadvantage of the time spent checking the code changes.
I think it depends also of the project, if the decision is to relate every commit to a ticket, or having the trackability of the commit types very well distinguished.

@javier-godoy
Copy link
Member Author

@mlopezFC

Yes, IMHO it's going to be almost impossible to avoid those situations so we should think in a workaround.

Maybe the statement of my question was ambiguous: I didn't refer to splitting as "make two commits from one" but "given changes A and B, should they go in the same commit?" (in general, it's possible to avoid unrelated changes from landing in the same commit, and they should be avoided). The question then is "are A and B unrelated changes?".

One thing that pops into my mind is to introduce the concept of dominant type: basically if there are many types that can be used because of the files that are being modified by the commit, and for some reason they cannot be split in several commits, we could sort them by some kind of priority

The concept of dominance is interesting, but we should be careful since it's also a two-edged sword that might be misunderstood as allowing multiple-purpose commits, which are not a good practice (#1).

Since the convention documents good practices, it should be clear on whether/when the good practice is splitting the commits or whether/when the good practice is not splitting the commits, and where is the gray area.

Many of the combinations among commits type are not possible in a logically atomic commit:

  • feat, fix, perf and refactor are mutually exclusive by definition (e.g. a refactor is "a code change that neither fixes a bug nor adds a feature").
  • revert is a special commit, since it reverts one or more commits that have been applied earlier.
  • docs and style are either related to code that is modified by a more dominant change (refactoring a method, documenting the changes and applying consistent indentation in the refactored code), or aren't (in which case they are logically independent).
  • chore is kind of "nothing else fits". If something else fits, either that thing is not a chore, or there are two things: a chore and something else.

Regarding test:

If the commit will add a new feature and a unit test for it, then it's fine to use feat because it has a higher priority compared with test.
Yes, that makes sense. It wouldn't harm to have the feature and the unit test separate, either.

The only remaining cases are splitting build/ci against feat, fix, perf, refactor and test commits, as well as splitting build from ci changes (which depends on the difference between ci and build types #10).

If you think so, we can restrict the discussion to these cases.

@javier-godoy
Copy link
Member Author

@ngonzalezpazFC

I give an example of what we were doing with @javier-godoy while rewriting the commit history of the addons: If I had a commit that changed the pom, and those changes belong to two different types of the ones we defined, e.g. build and ci, I splitted the commit in two, one for each type, and both commits only changed de pom.

That was an unusual situation, since the commits were already there and we were rewriting them for tidying up the commit history (and by the way, also for validating the guidelines). Had the guidelines been written earlier, it would have been clear whether those changes belonged to the same commit or should have been committed separately.

One rule of thumb I use is if the subject accurately describes the changes (regardless of whether they are in the same file or not).

For instance, FlowingCode/ChipFieldAddon@b685ee3 consists of the following changes, with a commit message of "add demo layout":

(A) add demo layout -> UI code
(B) add LayoutTest -> unit test
(C) add demo-jar profile -> pom.xml
(D) exclude demo-jar from the assembly -> assembly.xml
(E) add JUnit dependency -> pom.xml

  • Change A is related to B, but none of them require adding the demo-jar profile (C) or excluding it from the assembly (D).
  • Changes C and D could be combined (dependending on the resolution of this issue) because C causes the tests.jar to be generated, and D excludes it from the assembly.
  • Changes C and E affects pom.xml, but they are unrelated.
  • Change E is required by B.

That commit was rewritten as:
(A+B) FlowingCode/ChipFieldAddon@7b4eb4e test(demo): add demo layout
(D) FlowingCode/ChipFieldAddon@0ed2bac build: exclude *-tests.jar
(C) FlowingCode/ChipFieldAddon@4ba0e9f ci: add demo-jar profile
(E) FlowingCode/ChipFieldAddon@21f7aa0 build: add junit dependency

(Note that the right order should have been E, A+B, C, D)

Anything that changes C or D cannot be in the same commit as A, B or E (otherwise that would be a multi-purpose commit). When to split the remaining changes is part of this question.

IMHO

  • It looks like C+D is an atomic change (whether it is build, ci or build/ci is subject of a separate discussion).
  • A commit of (A+B+E) looks quite messy, since a dependency with JUnit is added because of "add demo layout". Therefore commit E cannot be part of (A+B) and had to be split
  • It could have been rewritten as (A, B+E, C+D) or (A, C+D, B+E) though.
  • An alternative rewriting as (A, E, B, C+D) or (A, C+D, E, B) would be fine too.

@mlopezFC
Copy link
Member

Great discussion. Regarding logically atomic commits I agree with the definition and we should stick to that. Like you said, this is a development convention definition. The teams that adhere to this should review the commits and enforce the rules but they might take into account other factors, like priorities and judgments in special and corner cases.

Given that, we can focus on the types that can be considered in the same commit. I have the following comments:

feat, fix, perf and refactor are mutually exclusive by definition (e.g. a refactor is "a code change that neither fixes a bug nor adds a feature").

We should discuss about refactor. Refactors usually are difficult to split, and it's usual to do a refactor while implementing a feature. Imagine implementing a new feature and while in the middle of it you realize that it would be easier to do it by moving some classes around and reusing some code. The same can happen when while implementing a performance improvement. A refactor alone is a code change that neither fixes a bug nor adds a feature, but a feat, fix or perf can involve a refactor. In fact a refactor could be introduced because we're implementing a feature, would it add value to have it in a different commit?
In those cases feat, fix or perf will dominate a refactor. Of course the advice would be to apply the refactor first with an independant commit, but usually this is difficult to realize before implementing the feature and sometimes it would include a significant overhead to revert all of the changes, apply the refactor and then implement the changes.

revert is a special commit, since it reverts one or more commits that have been applied earlier.

What about revert? should we just paste the original commit message after revert? ... maybe we should discuss this in a separated issue, but it sounds relevant because a revert might be the most dominant type.

I think the situation it's pretty clear with docs, style (these two can be dominated) and chore (cannot be _dominated).

Regarding build and ci I think that build can be included in a feat, fix, perf or refactor as a lower dominance type. I'm not sure about ci, but it could be also possible (with this feature this ci step should be performed from now on).

Thoughts?

@javier-godoy
Copy link
Member Author

javier-godoy commented Dec 11, 2020

We should discuss about refactor. Refactors usually are difficult to split, and it's usual to do a refactor while implementing a feature. Imagine implementing a new feature and while in the middle of it you realize that it would be easier to do it by moving some classes around and reusing some code. The same can happen when while implementing a performance improvement. A refactor alone is a code change that neither fixes a bug nor adds a feature, but a feat, fix or perf can involve a refactor. In fact a refactor could be introduced because we're implementing a feature, would it add value to have it in a different commit?

If the feature itself is complex, and the refactor adds value even without the feature, it may make sense to commit them separately (so that the code base can benefit from the refactor, even if the feature is reverted).

refactor: move some classes around and reuse some code
feat: add support for FooBar

versus:

feat: add support for FooBar

Separate the Bar behavior from BarQuux in order to reuse it in FooBar.

In those cases feat, fix or perf will dominate a refactor. Of course the advice would be to apply the refactor first with an independant commit, but usually this is difficult to realize before implementing the feature and sometimes it would include a significant overhead to revert all of the changes, apply the refactor and then implement the changes.

If it is difficult to realize before implementing the feature AND it would include a significant overhead to revert rebase all the changes, then just commit everything under a single commit. There's not "commit police", it's just a convention in order to add value to the history log, so that every commit means something more than "oops it changed again".

Note it's an AND (&&). If the realization comes before implementing, why wouldn't one refactor first and then implement the feature on top of that refactor?, In the same way, if the realization comes after the feature is implemented, but it's just a matter of excluding a couple of lines from a commit, or a five-minutes rebase, why wouldn't one do it?

What about revert? should we just paste the original commit message after revert? ... maybe we should discuss this in a separated issue, but it sounds relevant because a revert might be the most dominant type.

How does Conventional Commits handle revert commits?
«
Conventional Commits does not make an explicit effort to define revert behavior. Instead we leave it to tooling authors to use the flexility of types and footers to develop their logic for handling reverts. One recommendation is to use the revert type, and a footer that references the commit SHAs that are being reverted:

revert: let us never again speak of the noodle incident

Refs: 676104e, a215868

»

I wouldn't add anything about reverts at this moment because:

  • It's already covered by conventional-commits (in a loose way)
  • If reverts are frequent, it means that the commits are good enough (so that you don't need to revert "half of a commit")
  • We can add it at a later version of the spec, when we find some actual case that presents trouble.
  • Git tools already do a good job when fed with a conventional commit message.

I think the situation it's pretty clear with docs, style (these two can be dominated) and chore (cannot be _dominated).
+1

Regarding build and ci I think that build can be included in a feat, fix, perf or refactor as a lower dominance type. I'm not sure about ci, but it could be also possible (with this feature this ci step should be performed from now on).

I'll come back to this later, after reviewing some cases based on the agreement we have on the difference between build and ci #10

Thoughts?

I'm still not sure if we need to define dominance as a concept, or let it permeate through the definitions instead.

(Maybe we can move part of this discussion to separate issues in order to tackle them one at a time)

@mlopezFC
Copy link
Member

Great answer.
Regarding refactor, yes, my scenario was using the AND keyword. If I realize about the refactor before starting it's the perfect usage of the refactor type, and if I realized after I started but everything that was done is easy to stash and re-apply & correct (or rebase), then yes, another good usage of that type. But my point was that if those two cases are not true and it's not clear whether the refactor by it's own would add a significant value to the project, then it can go in the same commit, with the feat, fix or perfas a dominant type.
Regarding revert. I think that we're good with conventional commit's recommendation, but if the commit author doesn't include a reference of what is going to be reverted it could be a problem and the added value would be to little. Maybe we can tackle this in #2.
Regarding build & ci, I think that the key in here is the ability of the developer to estimate that the build or ci change is relevant outside the scope of the feat, fix, perf or refactor that is being applied. Adding Lombok or Junit as a dependency is probably relevant outside the scope of an specific commit, because the developer can correctly estimate that is going to be used in following commits outside the current assignment (thus it makes sense to have it in a separated commit with build type). Adding a dependency to Redis in order to implement external cache management, is probably meaningful only in the context of the commit that contains the remaining configurations that allows the application to use it (thus can be included in the feat that adds support to external cache management).
So basically build should be used when the developer wants to emphasize that the dependency is added in order to use a feature that can be re-utilized later in the project. If adding support for that feature involves not only modifying the pom.xml file, but also modifying code, it should be a feat instead. Another good usage of build type is to update a dependency version in order to take advantage of fixes and features of the new version (because it will probably affect the generated artifact, ie: a war file).

@javier-godoy
Copy link
Member Author

@mlopezFC

Regarding revert, I noted a proposed example in #2

Other changes that can be applied to the document are:

Add the following paragraph to the introduction

These guidelines encourage logically atomic commits (ref1, ref2), i.e. commits that are big enough to add value to the project, and small enough to read, review and revert. There is no hard and fast rule for determining what adds value to the project, or what is small enough: use common sense.

Reorder the list of types. Based on dominance, but without defining dominance as a concept

  • Commits that contribute to the application source code:

    • feat: A new feature (correlates with MINOR in semantic versioning)
    • fix: A bug fix (correlates with PATCH in semantic versioning)
    • perf: A code change that improves performance
    • refactor: A code change that neither fixes a bug, nor adds a feature, nor implement a performance improvement
  • Commits that contribute to unit tests

    • test: Adding missing tests or refactoring/fixing existing tests
  • Commits that contribute to the build process and external dependencies:

    • build: Changes to the build process or external dependencies affecting the exported artifacts (i.e. those artifacts that are created as a result of such process, and are utilized as final deliverables or included in other external projects). Correlates with a PATCH, MINOR or MAJOR increment in semantic versioning, depending on the nature of the change
    • ci: Changes to the CI configuration, and other changes to the build process or external dependencies with no impact in the exported artifacts (e.g.: configure code quality metrics, add dependencies that are only needed for running unit tests). Does not correlate with an increment in semantic versioning, because the versioned artifacts are not modified
  • Trivial commits

    • docs: Documentation only changes
    • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
  • Other commits

    • revert: Reverts a previous commit
    • chore: Changes, not covered by other types

In the context of our discussion above, this classification has the following properties:

  • Contributions to the application code (feat, fix, perf, refactor) are mutually exclusive by definition.
  • Contributions to unit test can be dominated by Contributions to the application code
  • Contributions to the build process (build / ci) are mutually exclusive by definition and can be dominated by Contributions to the application code, or by Contributions to unit tests.
  • Trivial commits can be dominated by any type of Contribution.
  • Other commits cannot be dominated.

Regardless of these changes, we can keep this issue open, and review it later .
What do you think?

@mlopezFC
Copy link
Member

Totally agree with everything! Nice work!

javier-godoy added a commit that referenced this issue Dec 15, 2020
Mention the concept of logically atomic commit and reorder the list of commit types according to the discussion in #12
@javier-godoy
Copy link
Member Author

✔ Changes applied as discussed above.

(This issue, however, remains opens in order to collect additional discussion.)

@javier-godoy javier-godoy added deferred The issue has been deferred for additional discussion and removed discuss The issue is scheduled for internal discussion labels Dec 15, 2020
javier-godoy added a commit that referenced this issue Dec 15, 2020
Mention the concept of logically atomic commit and reorder the list of commit types according to the discussion in #12
javier-godoy added a commit that referenced this issue Dec 17, 2020
Mention the concept of logically atomic commit and reorder the list of commit types according to the discussion in #12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conventional-commits deferred The issue has been deferred for additional discussion
Projects
None yet
Development

No branches or pull requests

4 participants