ZOOKEEPER-4880: Generate comments from zookeeper.jute into code. #2206

luozongle01 · 2024-10-23T15:04:18Z

Generate comments from zookeeper.jute into code: https://issues.apache.org/jira/browse/ZOOKEEPER-4880

Brief Description

Support generating end-of-line comments and class comments in zookeeper.jute into code.

How to test this change

By comparing the zookeeper-jute/target/generated-sources/java/org/apache/zookeeper and zookeeper-client/zookeeper-client-c/generated directories before and after the modification, there are no other differences except the comments.

2. When the test class has multiple lines of comments, the comments can be generated normally.

3. Testing vector type properties can generate comments normally.

4. Testing class type properties can generate comments normally

5. Testing comments that do not meet expectations will not generate comments and will not affect the generation of normal code.

luozongle01 · 2024-10-23T16:19:07Z

@kezhuw Could you help me take a look? Thanks.

kezhuw

Hi @luozongle01, thank you for your contribution!

Given the fact that jute and all targeting languages support both // and multi-line comments, I suggest we output whatever comments captured in origin sharp. With help of beginLine, beginColumn, endLine and endColumn of Token, I think we can reproduce comments in generated code with indentation adjusted.

kezhuw · 2024-10-28T03:45:56Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

+        while (tmpTkn != null) {
+            String image = tmpTkn.image.trim();
+            tmpTkn = tmpTkn.next;
+            if (image.startsWith("//")) {


The capture does not shaped well for directly usage. I tasted the following capture, and it captured a complete line comment. This way we can output whatever it captured without post work.

SPECIAL_TOKEN : { <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")> }

See also https://javacc.github.io/javacc/tutorials/token-manager.html

Thank you very much for your comment. I will make some further modifications.😆

The capture does not shaped well for directly usage. I tasted the following capture, and it captured a complete line comment. This way we can output whatever it captured without post work.

SPECIAL_TOKEN : { <SINGLE_LINE_COMMENT: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")> }

See also https://javacc.github.io/javacc/tutorials/token-manager.html

It seems like there's no need to change this now, because I get all the comments，then I use and distinguished them by line numbe. 😂😂

luozongle01 · 2024-10-30T01:34:59Z

Hi, @kezhuw, I made some modifications, thank you for your review last time. I feel much better now than last time.
I hope you can help me look again.

Currently, multi-line comments and single-line comments are supported.

I put JField or JRecord into a TreeSet, so that I can get the line or column information of the previous or next element. According to the beginLine, beginColumn, endLine, endColumn information, I determine the range of my comment area, and then format the output in line number order.

Here is my testing process.

Verify that the modification only affects the comments and does not affect the normal code.
When verifying that each JField is on its own line, multi-line comments and single-line comments are generated normally.

If the attributes are on the same line, the comment should belong to the previous Field.

kezhuw

@luozongle01 Thank you for your work!

It might be late, I am sorry for this, but I think we probably should align on how comments from jute should be formatted in generated sources ? I see two approaches.

Keep it in origin format.
Transfer comments from one format to another.

I saw this pr is going on the second approach. I am good to unify the generated comment format. But I think it could be challenge, as we should escape between two formats which could make generated comments strange/inaccurate.

    // /**
    //  * Some dead comments.
    //  */
    //
    // Things changed. We are going ..

Currently, the generated comments blindly strip some special strings. I don't think it is a good thing.

I suggest we keep it simple and don't touch comment content, this is also a safe path. If we want javadoc style comments, I think it is ok to convert them manually.

kezhuw · 2024-11-07T06:04:49Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

+            String[] commentArray = commentMulti.split("\n");
+            for (String comment : commentArray) {
+
+                comment = comment.replaceAll("//|/\\*\\*|/\\*|\\*\\*/|\\*/", "")


I do think people will want to maintain this kind of code.

s/do/don't/

I do think people will want to maintain this kind of code.
I'll amend that and I won't touch the comments so they're not needed here.

kezhuw · 2024-11-07T06:29:24Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

+                comment = comment.replaceAll("//|/\\*\\*|/\\*|\\*\\*/|\\*/", "")
+                        .replaceAll("^[* ]+", "").trim();
+
+                if (!comment.isEmpty()) {


The generated comments has no empty line which is not good.

luozongle01 · 2024-11-07T08:10:20Z

@luozongle01 Thank you for your work!

It might be late, I am sorry for this, but I think we probably should align on how comments from jute should be formatted in generated sources ? I see two approaches.

Keep it in origin format.

Transfer comments from one format to another.

I saw this pr is going on the second approach. I am good to unify the generated comment format. But I think it could be challenge, as we should escape between two formats which could make generated comments strange/inaccurate.
    // /**
    //  * Some dead comments.
    //  */
    //
    // Things changed. We are going ..
Currently, the generated comments blindly strip some special strings. I don't think it is a good thing.

I suggest we keep it simple and don't touch comment content, this is also a safe path. If we want javadoc style comments, I think it is ok to convert them manually.

@kezhuw Thank you very much for your review. I really didn't consider that 😂😂. I will keep the original format output.

However, for the end-of-line comments, should I continue to keep them at the end of the line or is it better to move them above the code? 🤔

kezhuw · 2024-11-07T13:04:29Z

for the end-of-line comments, should I continue to keep them at the end of the line or is it better to move them above the code?

I prefer to move them above the code.

luozongle01 · 2024-11-07T13:11:32Z

for the end-of-line comments, should I continue to keep them at the end of the line or is it better to move them above the code?

I prefer to move them above the code.

Thanks, I thought so too, I'll modify the code.

luozongle01 · 2024-11-07T15:07:42Z

Hi @kezhuw , I made some changes, could you take a look at it for me?

I won't touch the annotation content now.

But I still formatted it a little bit here, because for example, field needs a few spaces, class does not need spaces, and the number of spaces in java and c is also different.

  private static String getComments(String space, List<String> commentList) {
      if (commentList == null || commentList.isEmpty()) {
          return "";
      }

      String comments = String.join("", commentList);

      StringBuilder formatBuilder = new StringBuilder();
      for (String s : comments.split("\r?\n")) {
          formatBuilder.append("\n")
                  .append(space)
                  .append(s.replaceAll("^[ \t]+", ""));
      }

      return formatBuilder.append("\n").toString();
  }

Now the comment is like this

luozongle01 · 2024-11-14T04:50:44Z

Hi @kezhuw, These days, I have researched how JavaParser and Protobuf handle comments again.

The Comment#content method of Javaparser can retrieve the content inside the comment, which will remove the first layer/** */ or /* */ or // .
I think I might also be able to obtain the internal content of comments in this way for formatting purposes.

The creatCommentFromToken method is the process of handling content
https://github.com/javaparser/javaparser/blob/master/javaparser-core/src/main/javacc-support/com/github/javaparser/GeneratedJavaParserTokenManagerBase.java

Then I also tested the process of protobuf handling comments.

protobuf will format /** */ or /* */ or // as /** */.

Will convert multi line comments nested within single line comments into Html Entity. (Because multi line comments cannot be nested within each other)

So I think I just need to handle the following single line comments and a few situations

/*       /*        /**         /**
          *                     *
          *                     *
*/        */        */          */

For internal comments, I only need to convert/* and */ to HTML Entity (because multi line comments cannot be nested)

So can I make further modifications this way or are there any issues I haven't thought of?

kezhuw

Thank you for your contribution! I left some inline comments regarding the code.

Regarding comment style, I would say zookeeper-jute is used only by ZooKeeper and is not adopted by other projects as IDL. It is enough for us to go here.

kezhuw · 2024-11-19T06:18:45Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JCommentGenerator.java

+    }
+
+    public void addJField(JField jField) {
+        jFieldTreeSet.add(jField);


extends RangeInfo is aggressive. addCommentRange(JXyz jXyz, RangeInfo rangeInfo) should be sufficient.

OK, I will modify it

kezhuw · 2024-11-19T06:37:43Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JCommentGenerator.java

+        return getCommentsList(jField, jFieldTreeSet, fieldCommentsTokenMap);
+    }
+
+    private static List<String> getCommentsList(RangeInfo rangeInfo,


I expect this should be much clear. Given that we support only comments in three places:

Comments before class for record types.

Comments before fields.

Line ending comments for fields.

For the first, we could extract comments from Token::specialToken. For the second, similar to the first, but we have to drop comments(could be in either // or /* */ form) belong to previous field. For the third, additional lookup is required, also drop comments belong to next field.

luozongle01 · 2024-11-21T14:17:18Z

Hi @kezhuw, Thank you for your review last time. The code feels much more concise now.
Could you take another look at it for me?

kezhuw

Looks good to me in direction. I left some inline comments.

kezhuw · 2024-11-26T03:43:38Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/generated/rcc.jj

-<WithinOneLineComment> MORE :
-{
-  <~[]>
+  <WithinOneLineComment: "//" (~["\n","\r"])* ("\n"|"\r"|"\r\n")>


Should be "ONE_LINE_COMMENT" ? It is as a "Token.kind" constant in generated in "RccConstants".

Should be "ONE_LINE_COMMENT" ? It is as a "Token.kind" constant in generated in "RccConstants".

Yes, I'll make some modifications here.

kezhuw · 2024-11-26T03:50:14Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

+        return getFieldComments(jField, "    ");
+    }
+
+    private String getFieldComments(JField jField, String space) {


s/space/indent/ ?

s/space/indent/ ?

This is because org.apache.jute.compiler.JType#genCDecl also contains spaces, so I also used spaces. 😂

Sorry for the confusion! I means it should be better to rename variable space to indent.

Sorry for the confusion! I means it should be better to rename variable space to indent.

Okay, I will make some modifications here.

kezhuw · 2024-11-26T04:01:26Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

+        return formatBuilder.append("\n").toString();
+    }
+
+    private static List<String> getComments(Token token) {


Currently, all leading spaces in mult-line comment are dropped in generation. Can we reconstruct these for multi-line comment ?

Or how about <MULTI_LINE_COMMENT: "/*" (~["*"])* "*" (~["*","/"] (~["*"])* "*" | "*")* "/"> ? (https://stackoverflow.com/questions/34550579/multiline-comments-in-javacc)

I find no hole despite there is one comment say that "not correct with respect to the language definition".

I think it is easy to reconstruct original format if multi-line comment is captured into one token.

all leading spaces in mult-line comment are dropped in generation

/** ** Something. **/

For example, above comment will be formatted as following.

/** ** Something. **/

Or how about <MULTI_LINE_COMMENT: "/" (~[""])* "" (~["","/"] (~[""]) "" | "")* "/"> ? (https://stackoverflow.com/questions/34550579/multiline-comments-in-javacc)

I think it is easy to reconstruct original format if multi-line comment is captured into one token.

I am not sure the LOOKAHEAD could help here. I tried but did not succeed.

Oh, then I'll research and see if I can capture multi line comments as one token.

Currently, all leading spaces in mult-line comment are dropped in generation. Can we reconstruct these for multi-line comment ?

Or how about <MULTI_LINE_COMMENT: "/*" (~["*"])* "*" (~["*","/"] (~["*"])* "*" | "*")* "/"> ? (https://stackoverflow.com/questions/34550579/multiline-comments-in-javacc)

I find no hole despite there is one comment say that "not correct with respect to the language definition".

I think it is easy to reconstruct original format if multi-line comment is captured into one token.

This way, multiple lines of comments can be obtained into one token.

MORE : { <ENTER_JAVADOC_COMMENT: "/**" ~["/"]> { input_stream.backup(1); } : IN_JAVADOC_COMMENT | <ENTER_MULTILINE_COMMENT: "/*"> : IN_MULTI_LINE_COMMENT } <IN_JAVADOC_COMMENT> SPECIAL_TOKEN : { <JAVADOC_COMMENT: "*/" > : DEFAULT } <IN_MULTI_LINE_COMMENT> SPECIAL_TOKEN : { <MULTI_LINE_COMMENT: "*/" > : DEFAULT } <IN_JAVADOC_COMMENT, IN_MULTI_LINE_COMMENT> MORE : { <COMMENT_CONTENT: ~[] > }

But there is still a small issue, it seems that there is no way to remove the space at the beginning of each middle line, and there is no space before the first line of multi line comments. 😂😂

Changes: 1. No change to one-line comment capture. 2. Change only indentation and line separator for multi-line comment. The reconstruction might be a bit hard given the current comment capture from apache#2206. I am OK to both approches.

luozongle01 · 2024-11-27T06:29:02Z

Hi @kezhuw , Thank you for providing the example.
I didn't expect this to reconstruct the style of the jute. 😂 This is much better than before. I have made modifications based on your example and tested it.

zookeeper-jute/src/main/java/org/apache/jute/compiler/generated/rcc.jj

kezhuw · 2024-11-27T12:24:57Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

+        return getFieldComments(jField, "    ");
+    }
+
+    private String getFieldComments(JField jField, String space) {


Sorry for the confusion! I means it should be better to rename variable space to indent.

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

luozongle01 · 2024-11-28T02:39:27Z

Hi @kezhuw , I have completed the modifications and testing, can you help me take a look again

kezhuw

Hi @luozongle01, thank you for the continuous update! I leave some comments regarding concrete implementation.

kezhuw · 2024-11-28T10:41:35Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/generated/rcc.jj

+        {
+            f.setTypeToken(typeTkn);
+            prevFieldSetNextTkn(typeTkn);
+            prevField = f;


prevField is not used outside Record, it should be a local also.

How about replace nextToken with semicolonToken ? I think it is suitable:

Semicolon is the end of field declaration.

semicolonToken.next is the next token. We can extract possible end of line comments from there.

This way we don't need prevField anymore.

How about replace nextToken with semicolonToken ? I think it is suitable:

Semicolon is the end of field declaration.

semicolonToken.next is the next token. We can extract possible end of line comments from there.

This way we don't need prevField anymore.

Ah, I see. Sorry I didn't think of writing this before.🤣🤣

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

luozongle01 · 2024-11-28T16:03:24Z

Hi @kezhuw , I have finished the revision. Thank you very much for your review. Since I may not have much work experience, there may be many small problems in my code. Thank you for pointing out where I can improve.😆

kezhuw

Thank you very much for your review. Since I may not have much work experience, there may be many small problems in my code. Thank you for pointing out where I can improve.😆

Never mind! I believe we are in good direction. I have left inline comments for latest update.

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

zookeeper-jute/src/main/java/org/apache/jute/compiler/generated/rcc.jj

kezhuw · 2024-11-29T07:57:57Z

zookeeper-jute/src/main/java/org/apache/jute/compiler/generated/rcc.jj

+            f.setTypeToken(typeTkn);
+            f.setPreviousToken(previousToken);
+            f.setNextToken(getToken(1));
+            previousToken = typeTkn;


Not a big deal. But I think the previousToken should be <SEMICOLON_TKN>.

kezhuw

int field; /* A multi-line comment. */ /* Another multi-line comment. */

I find that we are not handing above case. I left comment about this.

zookeeper-jute/src/main/java/org/apache/jute/compiler/generated/rcc.jj

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

zookeeper-jute/src/test/java/org/apache/jute/compiler/JRecordTest.java

zookeeper-jute/src/main/java/org/apache/jute/compiler/JRecord.java

luozongle01 · 2024-12-02T05:44:18Z

int field; /* A multi-line comment. */ /* Another multi-line comment. */
I find that we are not handing above case. I left comment about this.

Your testing is incredibly thorough! I hadn’t considered these cases before. I’m sorry that my code had so many issues and caused you so much extra work. 😂😂😂
I added a few more test cases.

kezhuw

LGTM

Could you squash this pr to one commit ?

luozongle01 · 2024-12-03T03:59:20Z

LGTM

Could you squash this pr to one commit ?

Thanks, I have already squash it to one commit.

tisonkun

LGTM. Thanks for this neat improvement!

kezhuw · 2024-12-07T06:02:58Z

@luozongle01 Thank you for your contribution! Merged!

luozongle01 force-pushed the ZOOKEEPER-4880 branch from f24cad3 to 6365b88 Compare October 23, 2024 15:14

kezhuw reviewed Oct 28, 2024

View reviewed changes

luozongle01 force-pushed the ZOOKEEPER-4880 branch 4 times, most recently from 52d4194 to 0631b4f Compare October 29, 2024 15:48

kezhuw self-requested a review November 6, 2024 09:33

kezhuw reviewed Nov 7, 2024

View reviewed changes

luozongle01 force-pushed the ZOOKEEPER-4880 branch from 0631b4f to ae35908 Compare November 7, 2024 13:42

kezhuw self-requested a review November 9, 2024 08:08

kezhuw reviewed Nov 19, 2024

View reviewed changes

luozongle01 force-pushed the ZOOKEEPER-4880 branch 3 times, most recently from 74e3326 to a33fc24 Compare November 21, 2024 14:03

kezhuw reviewed Nov 26, 2024

View reviewed changes

kezhuw mentioned this pull request Nov 26, 2024

Demonstrate multi-line comment reconstruction luozongle01/zookeeper#1

Closed

luozongle01 force-pushed the ZOOKEEPER-4880 branch from a33fc24 to 582c32c Compare November 27, 2024 05:08

kezhuw reviewed Nov 27, 2024

View reviewed changes

luozongle01 force-pushed the ZOOKEEPER-4880 branch 2 times, most recently from 3536f54 to 108ae97 Compare November 28, 2024 01:56

kezhuw reviewed Nov 28, 2024

View reviewed changes

luozongle01 force-pushed the ZOOKEEPER-4880 branch from 108ae97 to 99d15ec Compare November 28, 2024 15:35

kezhuw reviewed Nov 29, 2024

View reviewed changes

kezhuw reviewed Dec 2, 2024

View reviewed changes

kezhuw approved these changes Dec 3, 2024

View reviewed changes

ZOOKEEPER-4880: Generate comments from zookeeper.jute into code.

0e49b5f

luozongle01 force-pushed the ZOOKEEPER-4880 branch from 374c55c to 0e49b5f Compare December 3, 2024 03:56

kezhuw requested review from anmolnar, eolivelli, symat and tisonkun December 4, 2024 06:14

tisonkun approved these changes Dec 4, 2024

View reviewed changes

kezhuw merged commit d1d57c4 into apache:master Dec 7, 2024
14 checks passed

ZOOKEEPER-4880: Generate comments from zookeeper.jute into code. #2206

ZOOKEEPER-4880: Generate comments from zookeeper.jute into code. #2206

Conversation

luozongle01 commented Oct 23, 2024

Brief Description

How to test this change

luozongle01 commented Oct 23, 2024

kezhuw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luozongle01 commented Oct 30, 2024

kezhuw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luozongle01 commented Nov 7, 2024 • edited Loading

kezhuw commented Nov 7, 2024

luozongle01 commented Nov 7, 2024

luozongle01 commented Nov 7, 2024

luozongle01 commented Nov 14, 2024 • edited Loading

kezhuw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luozongle01 commented Nov 21, 2024

kezhuw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luozongle01 Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kezhuw Nov 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luozongle01 commented Nov 27, 2024 • edited Loading

Choose a reason for hiding this comment

luozongle01 commented Nov 28, 2024

kezhuw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

luozongle01 commented Nov 28, 2024

kezhuw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kezhuw left a comment

Choose a reason for hiding this comment

luozongle01 commented Dec 2, 2024

kezhuw left a comment

Choose a reason for hiding this comment

luozongle01 commented Dec 3, 2024

tisonkun left a comment

Choose a reason for hiding this comment

kezhuw commented Dec 7, 2024

luozongle01 commented Nov 7, 2024 •

edited

Loading

luozongle01 commented Nov 14, 2024 •

edited

Loading

luozongle01 Nov 26, 2024 •

edited

Loading

kezhuw Nov 26, 2024 •

edited

Loading

luozongle01 commented Nov 27, 2024 •

edited

Loading