Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

5.2 support: Raw identifiers #2619

Closed
wants to merge 8 commits into from
Closed

Conversation

Julow
Copy link
Collaborator

@Julow Julow commented Nov 14, 2024

This is the last missing piece for 5.2 support.

Thanks to @ccasin who made the work in janestreet#88

$ fmt_package_type c ctx cnstrs
$ fmt_attributes c attrs )
| Ptyp_open (lid, typ) ->
hvbox 2
( hvbox 0 (fmt_longident_loc c lid $ str ".(")
( hvbox 0 (fmt_longident_loc c ~constructor:true lid $ str ".(")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it ~constructor:true here ?

@hhugo
Copy link
Collaborator

hhugo commented Nov 14, 2024

We need raw idents to be preserved. if I use #effect in a 5.3 switch, I don't want it to be rewritten in a 5.2 switch.
The current implementation depends on Lexer.is_keyword that is version dependent.

@hhugo
Copy link
Collaborator

hhugo commented Nov 14, 2024

Thinking more about it, you could even imagine a mode that would help migrating to newer ocaml version by escaping idents that are becoming keyword.

@ccasin
Copy link
Contributor

ccasin commented Nov 14, 2024

Sorry, I should have mentioned it, but I think my PR in the janestreet repo has a bug: escaped operators in types need special treatment.

The compiler itself also has this bug, see the report in ocaml/ocaml#13603 and a fix in ocaml/ocaml#13604. I'm in the process of adapting that fix in my ocamlformat pr to our branch.

@Julow
Copy link
Collaborator Author

Julow commented Nov 14, 2024

I agree that we should preserve the raw identifier marker and not try to add it again later. I'm now investigating passing that information through a table, like we do for comments, as changing the AST for this might brings huge changes to the lexer and parser, making future backports harder.

Thinking more about it, you could even imagine a mode that would help migrating to newer ocaml version by escaping idents that are becoming keyword.

That should be possible on top of this work.

@hhugo
Copy link
Collaborator

hhugo commented Nov 14, 2024

changing the AST for this might brings huge changes to the lexer and parser, making future backports harder.

Can't you keep the escaping prefix during lexing ? Without touching the ast ?

@hhugo
Copy link
Collaborator

hhugo commented Nov 14, 2024

Sorry, I should have mentioned it, but I think my PR in the janestreet repo has a bug: escaped operators in types need special treatment.

If we go with preserving escaped idents, we no longer have special treatments

@hhugo
Copy link
Collaborator

hhugo commented Nov 15, 2024

Don't we need to update the lexer in parser-standard as well ?

@hhugo
Copy link
Collaborator

hhugo commented Nov 15, 2024

The upstream OCaml lexer is able to recognize different set of keywords. One can pass -keywords 5.2 on the command line

@Julow
Copy link
Collaborator Author

Julow commented Nov 15, 2024

I'm currently investigating changing the AST to represent how identifiers looked in the source. If that's too much changes in the parser and AST, I'll switch to the table approach.

Don't we need to update the lexer in parser-standard as well ?

No, I'll let the parser-standard behave as it does upstream (\#foo is represented the same as foo).

The upstream OCaml lexer is able to recognize different set of keywords. One can pass -keywords 5.2 on the command line

This should be plugged to the ocaml-version option that we have but that won't solve the issue that \#effect must not be turned into effect, regardless of the value of ocaml-version. This option might be set wrong, or people writing \#effect might intent to set it to 5.3 soon.
Preserving the source code is the only approach I'm considering now.

@hhugo
Copy link
Collaborator

hhugo commented Nov 15, 2024

Don't we need to update the lexer in parser-standard as well ?

No, I'll let the parser-standard behave as it does upstream (\#foo is represented the same as foo).

Ok, turns out the standard lexer was already updated in #2512

@hhugo
Copy link
Collaborator

hhugo commented Nov 15, 2024

I'm currently investigating changing the AST to represent how identifiers looked in the source. If that's too much changes in the parser and AST, I'll switch to the table approach.

In #2619 (comment), I was suggesting to not change the Ast and just store the escaped ident in the LIDENT token. What do you think ?

Julow added a commit to Julow/ocamlformat that referenced this pull request Nov 18, 2024
@Julow
Copy link
Collaborator Author

Julow commented Nov 18, 2024

I pushed the Parsetree approach here: #2620
The support in Fmt_ast is trivial in exchange for a +233 -198 change in the vendored parser. There's +88 -72 line changes in parser.mly, which will negatively impact future backports but which is less than I expected.

@Julow
Copy link
Collaborator Author

Julow commented Nov 18, 2024

In #2619 (comment), I was suggesting to not change the Ast and just store the escaped ident in the LIDENT token. What do you think ?

That would have been simpler indeed. I'll investigate how that looks.

@Julow
Copy link
Collaborator Author

Julow commented Nov 19, 2024

The lexer approach is in this PR: #2621
I'll close this, as the other approach is simpler and more complete.

@Julow Julow closed this Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants