5.2 support: Raw identifiers #2619

Julow · 2024-11-14T10:04:10Z

This is the last missing piece for 5.2 support.

Thanks to @ccasin who made the work in janestreet#88

The parser doesn't allow raw ident in module names but it does in module type names.

hhugo · 2024-11-14T10:21:19Z

lib/Fmt_ast.ml

        $ fmt_package_type c ctx cnstrs
        $ fmt_attributes c attrs )
  | Ptyp_open (lid, typ) ->
      hvbox 2
-        ( hvbox 0 (fmt_longident_loc c lid $ str ".(")
+        ( hvbox 0 (fmt_longident_loc c ~constructor:true lid $ str ".(")


Why is it ~constructor:true here ?

hhugo · 2024-11-14T10:47:29Z

We need raw idents to be preserved. if I use #effect in a 5.3 switch, I don't want it to be rewritten in a 5.2 switch.
The current implementation depends on Lexer.is_keyword that is version dependent.

hhugo · 2024-11-14T11:15:58Z

Thinking more about it, you could even imagine a mode that would help migrating to newer ocaml version by escaping idents that are becoming keyword.

ccasin · 2024-11-14T11:37:21Z

Sorry, I should have mentioned it, but I think my PR in the janestreet repo has a bug: escaped operators in types need special treatment.

The compiler itself also has this bug, see the report in ocaml/ocaml#13603 and a fix in ocaml/ocaml#13604. I'm in the process of adapting that fix in my ocamlformat pr to our branch.

Julow · 2024-11-14T15:09:53Z

I agree that we should preserve the raw identifier marker and not try to add it again later. I'm now investigating passing that information through a table, like we do for comments, as changing the AST for this might brings huge changes to the lexer and parser, making future backports harder.

Thinking more about it, you could even imagine a mode that would help migrating to newer ocaml version by escaping idents that are becoming keyword.

That should be possible on top of this work.

hhugo · 2024-11-14T19:11:30Z

changing the AST for this might brings huge changes to the lexer and parser, making future backports harder.

Can't you keep the escaping prefix during lexing ? Without touching the ast ?

hhugo · 2024-11-14T20:26:17Z

Sorry, I should have mentioned it, but I think my PR in the janestreet repo has a bug: escaped operators in types need special treatment.

If we go with preserving escaped idents, we no longer have special treatments

hhugo · 2024-11-15T08:34:26Z

Don't we need to update the lexer in parser-standard as well ?

hhugo · 2024-11-15T10:12:33Z

The upstream OCaml lexer is able to recognize different set of keywords. One can pass -keywords 5.2 on the command line

Julow · 2024-11-15T11:25:50Z

I'm currently investigating changing the AST to represent how identifiers looked in the source. If that's too much changes in the parser and AST, I'll switch to the table approach.

Don't we need to update the lexer in parser-standard as well ?

No, I'll let the parser-standard behave as it does upstream (\#foo is represented the same as foo).

The upstream OCaml lexer is able to recognize different set of keywords. One can pass -keywords 5.2 on the command line

This should be plugged to the ocaml-version option that we have but that won't solve the issue that \#effect must not be turned into effect, regardless of the value of ocaml-version. This option might be set wrong, or people writing \#effect might intent to set it to 5.3 soon.
Preserving the source code is the only approach I'm considering now.

hhugo · 2024-11-15T11:41:52Z

Don't we need to update the lexer in parser-standard as well ?

No, I'll let the parser-standard behave as it does upstream (\#foo is represented the same as foo).

Ok, turns out the standard lexer was already updated in #2512

hhugo · 2024-11-15T11:47:09Z

I'm currently investigating changing the AST to represent how identifiers looked in the source. If that's too much changes in the parser and AST, I'll switch to the table approach.

In #2619 (comment), I was suggesting to not change the Ast and just store the escaped ident in the LIDENT token. What do you think ?

Julow · 2024-11-18T16:40:57Z

I pushed the Parsetree approach here: #2620
The support in Fmt_ast is trivial in exchange for a +233 -198 change in the vendored parser. There's +88 -72 line changes in parser.mly, which will negatively impact future backports but which is less than I expected.

Julow · 2024-11-18T16:48:27Z

In #2619 (comment), I was suggesting to not change the Ast and just store the escaped ident in the LIDENT token. What do you think ?

That would have been simpler indeed. I'll investigate how that looks.

Julow · 2024-11-19T10:30:36Z

The lexer approach is in this PR: #2621
I'll close this, as the other approach is simpler and more complete.

ccasin and others added 8 commits November 13, 2024 16:47

support for raw identifiers

91e5f53

Integrate review: bugs

57bb7a0

More bugs

2a92357

Remove extended syntax from test

9f840b4

Type names and type variables can be raw idents

1a4add1

Idents in module names

91886a2

The parser doesn't allow raw ident in module names but it does in module type names.

More coverage of 'fmt_str_loc'

e1f5566

Update CHANGES

4cfd3a4

hhugo reviewed Nov 14, 2024

View reviewed changes

Julow added a commit to Julow/ocamlformat that referenced this pull request Nov 18, 2024

test: Add raw_identifier from ocaml-ppx#2619

c70f8e4

Julow mentioned this pull request Nov 18, 2024

5.2 support: Raw identifiers (by changing the Parsetree) #2620

Closed

Julow mentioned this pull request Nov 19, 2024

Support 5.2 raw identifiers #2621

Merged

Julow closed this Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5.2 support: Raw identifiers #2619

5.2 support: Raw identifiers #2619

Julow commented Nov 14, 2024

hhugo Nov 14, 2024

hhugo commented Nov 14, 2024

hhugo commented Nov 14, 2024

ccasin commented Nov 14, 2024 •

edited

Loading

Julow commented Nov 14, 2024

hhugo commented Nov 14, 2024

hhugo commented Nov 14, 2024

hhugo commented Nov 15, 2024

hhugo commented Nov 15, 2024

Julow commented Nov 15, 2024

hhugo commented Nov 15, 2024

hhugo commented Nov 15, 2024

Julow commented Nov 18, 2024

Julow commented Nov 18, 2024

Julow commented Nov 19, 2024

5.2 support: Raw identifiers #2619

5.2 support: Raw identifiers #2619

Conversation

Julow commented Nov 14, 2024

hhugo Nov 14, 2024

Choose a reason for hiding this comment

hhugo commented Nov 14, 2024

hhugo commented Nov 14, 2024

ccasin commented Nov 14, 2024 • edited Loading

Julow commented Nov 14, 2024

hhugo commented Nov 14, 2024

hhugo commented Nov 14, 2024

hhugo commented Nov 15, 2024

hhugo commented Nov 15, 2024

Julow commented Nov 15, 2024

hhugo commented Nov 15, 2024

hhugo commented Nov 15, 2024

Julow commented Nov 18, 2024

Julow commented Nov 18, 2024

Julow commented Nov 19, 2024

ccasin commented Nov 14, 2024 •

edited

Loading