Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable and require Java code formatting #2963

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

michalmac
Copy link
Member

@michalmac michalmac commented Nov 26, 2023

There is a growing interest in standardising Java code formatting in MATSim. We discussed this topic several times in the past, and recently it was brought up during Code Sprint 2023: https://github.com/orgs/matsim-org/projects/4?pane=issue&itemId=40934408.

This PR suggests a reasonably straightforward approach:

  • no configurability (just one way of formatting)
  • well maintained (both formatter and maven plugin)
  • relatively popular
  • good support in IDEs

Formatting

(excerpt from README.md)

We use fmt-maven-plugin for formatting Java code according to Google Java Style.
This maven plugin internally uses google-java-format to format code or check if it has been properly formatted.

To format code, run mvn com.spotify.fmt:fmt-maven-plugin:format.

For ease of coding, we suggest configuring the code formatting settings in your IDE (see instructions for IntelliJ
and Eclipse).

Java code formatting is checked during the verify phase in maven. Passing this check is required for merging a PR to protected branches.

TODO

(if we decide for this formatting approach)

This PR is a draft for evaluating this approach. If we decide to adapt it, a new PR will be created (from the latest commit on master) to avoid resolving conflicts.

After merging to master, all open PRs will need to be adapted to the new formatting before they can be merged to master. The simplest approach could look like:

  1. rebase on the last commit on master prior code formatting (to resolve all conflicts with master that are independent to code formatting on master)
  2. squash all commits on the branch
  3. format code on the branch
  4. amend the previous commit (2), or simply commit and squash into (2)
  5. rebase on the commit on master that reformatted code

We need to test this path first, but once tested, steps (2-5) can be handled automatically with a script, so only step (1) will require some manual work of the branch owner/author.

However, it would be good to merge as many open PRs as possible before the code formatting gets introduced.

@kt86
Copy link
Contributor

kt86 commented Nov 27, 2023

Great, @michalmac !

Two (and a half) questions: We do have repos, having matsim-libs as parent in its pom. Some of them also with one GH action testing.
What do we need to adapt there? Are some setting of the pom inherited?
Do I need to update any of the GH action scripts?

I am asking because I plan to introduce this format also there, to have everything in the same style.
Unfortunately, it is difficult to see because this PR changes so many files that even filtering the changed files does not work properly anymore.

@michalmac
Copy link
Member Author

michalmac commented Nov 27, 2023

Great, @michalmac !

Two (and a half) questions: We do have repos, having matsim-libs as parent in its pom. Some of them also with one GH action testing. What do we need to adapt there? Are some setting of the pom inherited? Do I need to update any of the GH action scripts?

I am asking because I plan to introduce this format also there, to have everything in the same style. Unfortunately, it is difficult to see because this PR changes so many files that even filtering the changed files does not work properly anymore.

Really good questions!

  • I really recommend using the same formatting across all projects if possible. It's really convenient.
  • The commit is practically unreadable. Therefore, I want to create a small script to help with adapting development branches so there is only one such commit in the git history.
  • If you point to the matsim POM as the parent POM, then the setting will get inherited. That means the format check will be executed in mvn verify, which is usually called in CIs. No need to change anything there. However, if you do not want to follow the matsim code format, it is possible to override the plugin settings in a child POM, it is also possible to specify -Dfmt.skip when calling mvn.

@kt86
Copy link
Contributor

kt86 commented Nov 30, 2023

@michalmac: I have tried it out now for my other repo.

Here are two things that I noticed:

1.)
Updating the format in IntelliJ work fine. As long as only the mentioned IntelliJ plugin is installed, It only changed something when I did reformat file.
So I also installed the code style from https://github.com/google/styleguide/blob/gh-pages/intellij-java-google-style.xml in IntelliJ as code Style:
grafik
Now, I was also able to use reformat code as well.


2.)
The automatic line wrapping is IMO not so nice (independent of the reformat way: file or code):
Once it decides that a line is too long, it wraps it at any possible location and not only once and maybe again if the second line is also full. This happens at different places, e.g.: definitions of a class' implementations, constructors, calls of functions/methods - if the signature is too long, strings constructs with "+", ..., line comments
see e.g.
https://github.com/matsim-vsp/logistics/blob/6e34659f357f5d81d9a201d05dff48489dad128d/src/main/java/org/matsim/freight/logistics/LSPControlerListener.java#L44-L52
or
https://github.com/matsim-vsp/logistics/blob/6e34659f357f5d81d9a201d05dff48489dad128d/src/main/java/org/matsim/freight/logistics/example/lsp/initialPlans/ExampleTwoEchelonGrid.java#L153-L161
This does not really help to improve the readability and also makes files much longer: I have a class that has grown from 430 to 530 lines only due to the reformatting.

Similar things also happen in matsim-libs (I saw some of these changes in this PR, but am too lazy now to search for them. If wished, I can look for some of them again)

--> Do we find there a suitable solution to avoid this line wrap inflation? As far as I understood, the Google code style also allows, less line-wrappings.(?)

@michalmac
Copy link
Member Author

@michalmac: I have tried it out now for my other repo.

Here are two things that I noticed:

Thanks @kt86 for trying out the formatting.

Re 1: Thanks. I have not mentioned about importing the the Google code style, because I thought it was described in the linked IntelliJ, but apparently it is not.

Re 2: Unfortunately, there is no configurability. This has advantages and disadvantages. I would say the bigger the community, the more advantages.

IMO splitting a long list of argument/interfaces/... into individual rows helps in spotting the actual diffs in code. Moreover, it is less likely to get many lines of diff if we add/rename/remove one argument. But the code gets longer.

==

I think we need to keep in mind that there is no formatting style that will make everyone happy. However, if many of us are (very) unhappy, then probably we need to come up with a different style. I think every feedback like yours is very valuable in making the final decision.

@kt86
Copy link
Contributor

kt86 commented Nov 30, 2023

Thank you, Michal!
I am totally in favor of a common standard and see many benefits of it. For me, as a non-expert, this Google java style seems to be a good choice because it seems to be a well-known standard, and it is effortless to integrate.
And I also like, that there is no configurability, so we will not have all the reformatting changes anymore. :)

For me personally, I can also live with that many additional lines. Nevertheless, I will try out, what happens, when not all of these line-wrappings are there, e.g., when writing new code. --> Is this something that is enforced? Or is it not enforced because both options are allowed, and it was just coming up due to the automatic reformat? ;)

So please just see it more as a note than a vote :)

And I am curious, what other people — and especially the more programming-experts — will say.

@michalmac
Copy link
Member Author

For me personally, I can also live with that many additional lines. Nevertheless, I will try out, what happens, when not all of these line-wrappings are there, e.g., when writing new code. --> Is this something that is enforced? Or is it not enforced because both options are allowed, and it was just coming up due to the automatic reformat? ;)

It is enforced. If you split lines in a different way, the format checker will complain.

So please just see it more as a note than a vote :)

Sure. Thanks.

@Janekdererste
Copy link
Member

This does not really help to improve the readability and also makes files much longer

This usually takes a while to get used to. Somehow, the overall opinion seems to be that long lines are less readable. The rust formatter as well as the very popular prettier formatter for Javascript also favor rather short lines. This also is not my favorite, especially for languages with type arguments, but then I don't care enough to invent my own format :-)

@kainagel
Copy link
Member

kainagel commented Dec 4, 2023 via email

@michalmac
Copy link
Member Author

michalmac commented Dec 7, 2023

Re line length: This formatter limits it at 100, but having 2-space indentation gives a little extra spaces within that limit, compared to the current style (4-space long tabs).

This PR results in 10% more lines, but I guess it is mainly a small fraction of complex (often hard to read) multi-line sections that contribute to this 10% increase. I would even guess that having the limit increased from 100 to 140 will probably have no significant impact.

I looked a bit to find some examples in the drt contrib.

Example 1:

		drtCfg.getDrtSpeedUpParams().ifPresent(drtSpeedUpParams -> {
			bindModal(DrtSpeedUp.class).toProvider(modalProvider(
					getter -> new DrtSpeedUp(getMode(), drtSpeedUpParams, getConfig().controller(),
							getter.get(Network.class), getter.getModal(FleetSpecification.class),
							getter.getModal(DrtEventSequenceCollector.class)))).asEagerSingleton();
			addControlerListenerBinding().to(modalKey(DrtSpeedUp.class));
		});

is formatted into something more structured:

    drtCfg
        .getDrtSpeedUpParams()
        .ifPresent(
            drtSpeedUpParams -> {
              bindModal(DrtSpeedUp.class)
                  .toProvider(
                      modalProvider(
                          getter ->
                              new DrtSpeedUp(
                                  getMode(),
                                  drtSpeedUpParams,
                                  getConfig().controller(),
                                  getter.get(Network.class),
                                  getter.getModal(FleetSpecification.class),
                                  getter.getModal(DrtEventSequenceCollector.class))))
                  .asEagerSingleton();
              addControlerListenerBinding().to(modalKey(DrtSpeedUp.class));
            });

Example 2:

				.in(Singleton.class); // Not an eager binding, as taxi contrib doesn't need this implementation, but
										// would search for VrpLegFactory which is provided in the
										// DrtModeOptimizerModule if bound eagerly

The comment is located a bit unfortunately, so the automated formatting leads to the following:

        .in(
            Singleton
                .class); // Not an eager binding, as taxi contrib doesn't need this implementation,
    // but
    // would search for VrpLegFactory which is provided in the
    // DrtModeOptimizerModule if bound eagerly

There is no support for more intelligent "re-formatting" of comments. However, problems like this one will get resolved over time as we modify existing code.

Example 3:

				entriesList.forEach(entry -> writer.writeNext(
						new String[] { zone.getId(), "" + c.getX(), "" + c.getY(), "" + entry.getFirst(),
								"" + entry.getSecond() }, false));

gets formatted into something which is presumably more readable, but longer:

                entry ->
                    writer.writeNext(
                        new String[] {
                          zone.getId(),
                          "" + c.getX(),
                          "" + c.getY(),
                          "" + entry.getFirst(),
                          "" + entry.getSecond()
                        },
                        false));

Example 4:

	private record DrtLeg(Id<Request> request, double submissionTime, double departureTime, Id<Person> person, Id<DvrpVehicle> vehicle, Id<Link> fromLinkId, Coord fromCoord,
						  Id<Link> toLinkId, Coord toCoord, double waitTime, double unsharedDistanceEstimate_m, double unsharedTimeEstimate_m,
						  double arrivalTime, double fare, double earliestDepartureTime, double latestDepartureTime, double latestArrivalTime) {
	}

This long record definition starts to look more like a java class definition after formatting:

  private record DrtLeg(
      Id<Request> request,
      double submissionTime,
      double departureTime,
      Id<Person> person,
      Id<DvrpVehicle> vehicle,
      Id<Link> fromLinkId,
      Coord fromCoord,
      Id<Link> toLinkId,
      Coord toCoord,
      double waitTime,
      double unsharedDistanceEstimate_m,
      double unsharedTimeEstimate_m,
      double arrivalTime,
      double fare,
      double earliestDepartureTime,
      double latestDepartureTime,
      double latestArrivalTime) {}

My overall impression is that the formatted code is easier to read. Moreover, the whole codebase is formatted in the same way, which always increases the readability (regardless of the chosen formatter).

@sebhoerl
Copy link
Contributor

Has there been any progress on this? Could we maybe just start deploying it per contrib (= drt ;-) )

@michalmac
Copy link
Member Author

Has there been any progress on this? Could we maybe just start deploying it per contrib (= drt ;-) )

No final decision yet. I think everyone wants it, but there is no formatting that satisfies everyone :-) So we are a bit stuck. I guess we need to be a bit more realistic with our expectations and weigh pros and cons.

I think the easiest way to move on is to do an all-in-one-go transition:

  • formatting has the viral effect: once the DRT contrib switches to this formatting, those who code in DRT would need to set up the right formatting in their IDEs and then it would impact the formatting in other contribs. At the same time others could still use old formatting, so there might be some "formatting wars" outside DRT :-)
  • To avoid all open PRs to have huge diffs with the master (which basically makes them unreviewable), we need to rebase the corresponding branches on the "formatted" master (this could be done almost automatically with some script). This rebasing of branches is easier in the all-in-one-go transition.

@sebhoerl
Copy link
Contributor

Ok thanks, personally, I don't care about the style as long as there is one as a couple of other people here, I think. Maybe we should put it to vote at the MATSim User Meeting :)

@nkuehnel
Copy link
Member

I like the code style with shorter lines :) take my vote

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants