Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly escape Javadoc code snippets #1397

Merged
merged 1 commit into from
Aug 1, 2024
Merged

Correctly escape Javadoc code snippets #1397

merged 1 commit into from
Aug 1, 2024

Conversation

lunaris
Copy link
Contributor

@lunaris lunaris commented Jul 30, 2024

Escaping code snippets for use in Javadoc comments is non-trivial. By default, Javadoc comments are expected to be HTML markup. While HTML offers <pre> and <code> tags, these only handle formatting and do not remove the need to escape e.g. characters such as < and >. To this end, Javadoc provides options such as {@code ...} for embedding text verbatim. However, these only handle escaping and not formatting. One thus generally combines the two approaches to yield comments like <pre>{@code ...}</pre>. However, this combination itself introduces challenges around escaping:

  • @, which normally indicates a Javadoc tag (such as @code itself), needs to be escaped in some situations but not others:
/**
 * <pre>
 * {@code
 * public class C {
 *   // This use of @ needs to be escaped lest Javadoc get confused and think
 *   // it's a nested tag (even though @code should explicitly prevent this).
 *   @SuppressWarnings
 *   public static void m1() {}
 *
 *   // This @ does not need to be escaped.
 *   public static void m2() {}
 * }
 * }
 * </pre>
 */
  • Braces ({ and }) need to be escaped if they are not balanced, since Javadoc counts braces in order to work out when to end the @code block.
  • The combination */ always needs to be escaped since it would otherwise premutately terminate the Javadoc comment.

Our escaping code is currently broken (see #1363), causing crashes because it fails to handle the case of unbalanced braces. Even in the cases where it does not cause crashes, the Javadoc generated is not necessarily faithful to the input, due to incorrect escaping of @ and */ depending on the context. For instance, while many sources state that {@literal @} is the correct escaping of @ within a @code block, this is not in general true and will result in {@literal @} being produced in the rendered output instead of the desired @.

This commit attempts to fix it once and for all. We do this by adopting a strategy whereby we leave the @code block temporarily when the need to escape a character arises. For example, given:

This contains an @ and {unbalanced braces

we now generate:

This contains an }{@literal @}{@code  and }{{@code unbalanced braces

with the expectation that this will end up inside a <pre>{@code ...}</pre> context. This results in generated comments that are harder to read, but render successfully and accurately, both as HTML Javadoc pages and e.g. hover documentation in most IDEs. We attempt to minimise the impact this will have on real-world SDKs/programs by only escaping if there is a need (that is, if a comment contains @, */, or unbalanced braces). The test suite for Javadoc processing has been bulked out so that hopefully this does not bite again (famous last words!).

Fixes #1363

image
image
image

@lunaris lunaris requested a review from a team as a code owner July 30, 2024 11:59
@lunaris lunaris force-pushed the javadoc-fix-v3 branch 3 times, most recently from 251c924 to ed27f75 Compare July 31, 2024 14:44
Escaping code snippets for use in Javadoc comments is non-trivial. By default,
Javadoc comments are expected to be HTML markup. While HTML offers `<pre>` and
`<code>` tags, these only handle formatting and do not remove the need to escape
e.g. characters such as `<` and `>`. To this end, Javadoc provides options such
as `{@code ...}` for embedding text verbatim. However, these only handle
escaping and not formatting. One thus generally combines the two approaches to
yield comments like `<pre>{@code ...}</pre>`. However, *this* combination itself
introduces challenges around escaping:

* `@`, which normally indicates a Javadoc tag (such as `@code` itself), needs to
  be escaped in some situations but not others:

```java
/**
 * <pre>
 * {@code
 * public class C {
 *   // This use of @ needs to be escaped lest Javadoc get confused and think
 *   // it's a nested tag (even though @code should explicitly prevent this).
 *   @SuppressWarnings
 *   public static void m1() {}
 *
 *   // This @ does not need to be escaped.
 *   public static void m2() {}
 * }
 * }
 * </pre>
 */
```

* Braces (`{` and `}`) need to be escaped if they are not balanced, since
  Javadoc counts braces in order to work out when to end the `@code` block.
* The combination `*/` always needs to be escaped since it would otherwise
  premutately terminate the Javadoc comment.

Our escaping code is currently broken (see #1363), causing crashes because it
fails to handle the case of unbalanced braces. Even in the cases where it does
not cause crashes, the Javadoc generated is not necessarily faithful to the
input, due to incorrect escaping of `@` and `*/` depending on the context. For
instance, while many sources state that `{@literal @}` is the correct escaping
of `@` within a `@code` block, this is not in general true and will result in
`{@literal @}` being produced in the rendered output instead of the desired `@`.

This commit attempts to fix it once and for all. We do this by adopting a
strategy whereby we leave the `@code` block temporarily when the need to escape
a character arises. For example, given:

```java
This contains an @ and {unbalanced braces
```

we now generate:

```java
This contains an }{@literal @}{@code  and }{{@code unbalanced braces
```

with the expectation that this will end up inside a `<pre>{@code ...}</pre>`
context. This results in generated comments that are harder to read, but render
successfully and accurately, both as HTML Javadoc pages and e.g. hover
documentation in most IDEs. We attempt to minimise the impact this will have on
real-world SDKs/programs by only escaping if there is a need (that is, if a
comment contains `@`, `*/`, or unbalanced braces). The test suite for Javadoc
processing has been bulked out so that hopefully this does not bite again
(famous last words!).

Fixes #1363
@lunaris
Copy link
Contributor Author

lunaris commented Aug 1, 2024

Having tested this manually with Azure Native and AWS, I'm going to merge this. Fingers crossed! 🤞

image
image

@lunaris lunaris merged commit 210c886 into main Aug 1, 2024
23 checks passed
@lunaris lunaris deleted the javadoc-fix-v3 branch August 1, 2024 09:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

v0.11.0 generates invalid javadoc for Azure Native
2 participants