Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directive blocks are not represented #28

Open
wkirschbaum opened this issue Nov 10, 2022 · 4 comments
Open

Directive blocks are not represented #28

wkirschbaum opened this issue Nov 10, 2022 · 4 comments

Comments

@wkirschbaum
Copy link

I am currently implementing indentation using emacs treesit ( tree-sitter integration ), which uses this library. Everything works well except for directive blocks like

<%= if true do %>
<.some_tag_or_component>
</.some_tag_or_component>
<% end %>

which produces

(fragment [0, 0] - [4, 0]
  (directive [0, 0] - [0, 17]
    (partial_expression_value [0, 3] - [0, 14]))
  (component [1, 0] - [2, 25]
    (start_component [1, 0] - [1, 24]
      (component_name [1, 1] - [1, 23]
        (function [1, 2] - [1, 23])))
    (end_component [2, 0] - [2, 25]
      (component_name [2, 2] - [2, 24]
        (function [2, 3] - [2, 24]))))
  (directive [3, 0] - [3, 10]
    (partial_expression_value [3, 3] - [3, 7])))

Since the directive is not the parent of the component there is no simple way of checking parent for indentation.

@the-mikedavis
Copy link
Member

the-mikedavis commented Nov 12, 2022

I took a look at implementing this change in da2b726

I'm not sure it's possible to robustly support the grouping of directives. An example like so seems straight-forward:

<%= if true do %>
  <%= if true do %>
  <% end %>
<% end %>

Using heuristics similar to what the HEEx engine uses (i.e. looking at the first/last tokens in the directive, do or -> or end), we can tell that the first two lines are incomplete expressions and the last two end the expressions. Essentially:

(partial_expression)
  (partial_expression)
  (ending_expression)
(ending_expression)

So far this looks straightforward but there are more complicated cases like so:

<%= case x do %>
  <%= ^x -> %>X
  <%= _x -> %>Not X
<% end %>

Which is essentially:

(partial_expression)
  (partial_expression)
  (partial_expression)
(ending_expression)

This is still doable. When we nest them though, we're in trouble:

<%= if true do %>
  <%= case x do %>
    <%= ^x -> %>X
    <%= _x -> %>Not X
  <% end %>
<% end %>
(partial_expression)
  (partial_expression)
    (partial_expression)
    (partial_expression)
  (ending_expression)
(ending_expression)

Which partial_expressions belong to which ending_expressions?

And we can't simply use the do token to tell apart the starts of groups from partial expressions because of fns:

<%= Enum.map(things, fn _thing -> %>
  Thing
<% end) %>

Between if, case, and fns, I'm not sure we are always able to decide how to group these.

However for the purposes of indentation, I think it's only necessary to indent when you see a partial_expression and dedent when you see a ending_expression, no? If we were to make a change that exposed partial expressions and ending expressions, would that be enough for indentation?

@wkirschbaum
Copy link
Author

Thanks for the detailed response and explanation. Do you mind if I tinker with your change for a day or two and see if I can make it work?

The next step for heex-mode is to add better navigation which also required directives to be part of the tree, but it is not essential and if we move towards :if and :for within the tags the effort on this might not be worth the trouble.

Just to give some insight on how Emacs treesit (tree-sitter binding ) works according to my understanding.

There are 3 parts:

  1. font-lock - simplest and just requires the ability to query and tag which matches the font

  2. indentation - need the ability to find the node type and node parent, then indents based on an anchor and offset

(defvar heex--treesit-indent-rules
  (let ((offset heex-indent-level))
    `((heex
       ((node-is "end_tag") parent-bol 0)
       ((node-is "end_component") parent-bol 0)
       ((node-is "end_slot") parent-bol 0)
       ((parent-is "component") parent-bol ,offset)
       ((parent-is "slot") parent-bol ,offset)
       ((parent-is "tag") parent-bol ,offset)
       ))))

It is possible to indent based on siblings, but it gets more tricky as custom functions needs to exists. Adding custom functions makes it possible, but also makes it less elegant and harder to maintain.

  1. Navigate - being able to jump up the s-expression ( parent ) tree or move forward/backwards over s-expressions ( siblings )
    This is probably the most complex, but an important part of any major mode in emacs. To be able to navigate around it is required to understand the structure in more detail. I don't believe it is extremely important to get this right with directives, but will be nice in the future.

@wkirschbaum
Copy link
Author

@the-mikedavis this is the same issue elixir-mode faced with a weak partial parser, but then injected artificial tokens for repeating match blocks to be able to be able to identify: https://github.com/elixir-editors/emacs-elixir/blob/master/elixir-smie.el#LL272C39-L272C39

The change to have an partial and ending_expression will make it possible to solve it in some way, but still probably not a very simple way :). It will be better, thanks.

I am not sure to what extend vscode/sublime etc.. uses or will use tree-sitter for indentation, but if this will only be an emacs thing ( and an optional dependency ) then it won't be worth spending too much time on this as it might not be a simple change to nest expressions. In the future if there is time for me to learn how tree-sitter grammar works I would like to attempt to change this. It will help with maintenance moving this complexity to this shared repo.

How would backward compatible work if we change this down the road?

@the-mikedavis
Copy link
Member

I'm not sure how it works for emacs but with other tree-sitter consumers like nvim or helix, any change that removes/renames nodes is breaking: any queries that include those nodes will fail the query analysis step and be unusable. Adding an ending_expression node which was originally a partial_expression node could be seen as breaking change too I suppose.

The tree-sitter consumers that I know about lock down the versions of the tree-sitter-* repositories so that a new update doesn't automatically break things, so breaking compatibility is usually not a huge deal

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants