Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Prototype of treesit.el support for ruby #47

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

AndrewSwerlick
Copy link

This commit is a prototype exploring what it would mean to make the minor mode work with the built in treesit.el treesitter support in Emacs 29. I've modified many of the functions and fold rules specific to the ruby language, and have tested locally with a few ruby code files to confirm things are working

The actual work consists mostly of ...

  • swapping out functions for their treesit.el equivalents
  • replacing symbols for node types with strings
  • Modifying (alist-get... usages to support string keys

I've created this PR to see if there's any interest in having a version of ts-fold that works with native treesit.el. If so there would be additional work on this PR, specifically

  • Figuring out how to handle mode hooks and global mode triggers
  • Updating other language rulesets
  • Figuring out how to handle the major mode alist

This commit is a prototype exploring what it would mean to make the
minor mode work with the built in `treesit.el` treesitter support in
Emacs 29. I've modified many of the functions and fold rules specific
to the ruby language, and have tested locally with a few ruby code
files to confirm things are working

The actual work consists mostly of ...
* swapping out functions for their `treesit.el` equivalents
* replacing symbols for node types with strings
* Modifying `(alist-get...` usages to support string keys
@AndrewSwerlick AndrewSwerlick mentioned this pull request Jan 4, 2023
("do_block" . (ts-fold-range-seq 1 -2)) ; match with `end`, in spec file
("then" . ts-fold-range-ruby-if) ; `if` and `elsif` block
("else" . (ts-fold-range-ruby-if 4 0)) ; `else` block
("comment"
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treesit.el uses strings for node types, not symbols

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be easy to change! Thanks!

@@ -82,7 +81,7 @@
(php-mode . ,(ts-fold-parsers-php))
(python-mode . ,(ts-fold-parsers-python))
(rjsx-mode . ,(ts-fold-parsers-javascript))
(ruby-mode . ,(ts-fold-parsers-ruby))
(ruby-ts-mode . ,(ts-fold-parsers-ruby))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With emacs 29, the treesitter enabled modes are actually seperate from the built in ones, so this alist entry had to change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it wouldn't conflict between the two, it might be good to keep both versions working? Maybe create another parser function named ts-fold-parsers-ruby-ts?

(while (and current (not (alist-get (tsc-node-type current) mode-ranges)))
(setq current (tsc-get-parent current)))
(while (and current (not (alist-get (treesit-node-type current) mode-ranges nil t #'equal)))
(setq current (treesit-node-parent current)))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to use the optional parameters for alist-get so we can do an equal compare since the keys are strings now.

(patterns (mapconcat (lambda (fold-range) (concat "(" (car fold-range) ") " "@name"))
(alist-get major-mode ts-fold-range-alist) " "))
(query (treesit-query-compile (treesit-node-language node) patterns))
(nodes-to-fold (treesit-query-capture node query nil nil t)))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With treesit.el queries have to be strings, not s-expressions. So instead of seq-mapcat, we just concat everything into one big string.

(thread-last nodes-to-fold
(mapcar #'cdr)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treesit-query-capture allows us to pass an optional param as the 5th argument that makes the return value a simple list of nodes, without the capture names, this means we don't have to use cdr to get the actual nodes

(setq text (tsc-node-text iter-node)
line (car (tsc-node-start-point iter-node))
(setq text (treesit-node-text iter-node)
line (treesit-node-start iter-node)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

treesit.el doesn't seem to distinguish between points and pos, so it only has treesit-node-start. In testing it doesn't seem to have made a difference, but I didn't test extensively

@jcs090218
Copy link
Member

jcs090218 commented Jan 4, 2023

Thanks for working on this!

Can you doubl-check if built-in treesit works with our plugins? 😅

@AndrewSwerlick
Copy link
Author

Thanks for working on this!

Can you doubl-check if built-in treesit works with our plugins? 😅

I haven't done that yet, but I'll take a look as I continue to work on this

@AndrewSwerlick
Copy link
Author

So I played around some more and it looks pretty hard to support both treesitter implementations simultaneously, but it's actually pretty easy to do an actual port. The current state of this branch has completely removed all references to emacs-tree-sitter and is only using the built in implementation. It was mostly a bunch of find and replace changes for equivalent functions with a few minor tweaks.

I've only tested it with ruby and javascript/typescript/tsx, but everything seems fine. I haven't tested indicators yet.

I think I'll maintain this seperate fork for my own purposes until you're ready to cut a new version that supports the built in treesit.el. It should be pretty easy to cut and paste at that point.

@jcs090218
Copy link
Member

Thanks for taking time to this! ❤️ 🚀

Creating another package might make more sense at this point since it's quite hard to keep both tree-sitter implementations in one package. Here are a few pros and cons I could think of:

pros

  1. User can choose what tree-sitter implementation over another
  • Looking for regular major-mode (without -ts) support should still use this package
  • Looking for major-ts-mode support should use the "new "package
  1. Easier to implement and maintain

cons

  1. Duplicate code for plugins' code (indicators, summary, etc)

I will continue to think before I start working on this! 😓

@AndrewSwerlick
Copy link
Author

Yeah I generally agree there are more arguments for a separate package, though I do think it's worth noting that on the https://emacs-tree-sitter.github.io/ homepage it says

For Emacs 29+, please use the built-in integration instead.

So I wonder if given that it makes sense to make it one package, but two different major versions? Minor fixes can be released to each version until Emacs 29+ adoption is enough to fully retire the old version.

@jcs090218
Copy link
Member

It makes sense if we fully abandoned Emacs versions lower than 29, but there are people who still use normal major-mode plus this package. It's better if we support these two groups (emacs-tree-sitter and builtin treesit).

To do so, creating new package to only support built-in treesit Emacs 29.1+ could help a little.

Or a switch, to switch between two tree-sitter API would help as well. That way, we don't need to create a new package.

So I wonder if given that it makes sense to make it one package, but two different major versions? Minor fixes can be released to each version until Emacs 29+ adoption is enough to fully retire the old version.

Can you elaborate?

@AndrewSwerlick
Copy link
Author

Can you elaborate?

So the more I've thought about this it probably won't work because it relies on some things I don't think exist in the emacs world. In the ruby and javascript worlds, you can do version pinning, where you say for my program don't install anything above version 2.X. So you can upgrade from version 2.2 -> 2.3, but the package manager will never upgrade you from version 2.2 -> 3.0. As a result, sometimes folks will maintain separate version tracks, where they ship updates both for the version 2.X track and the version 3.X tracks in the same timeframe. It's great for breaking changes like this, so you can provide some features to the old version, while still shipping a breaking change.

I don't think emacs has version pinning out of the box though, and you have to be using quelpa or straight.

@jcs090218
Copy link
Member

Ah... that make sense. Like python 2.x vs 3.x?

I use default built-in package.el so I probably would love to support that! 😅 Thanks for sharing your idea! It's indeed a great way to do for such tasks.

garyo and others added 11 commits February 2, 2023 09:35
Not sure why this is needed but I get startup errors without these.
Perhaps an ordering issue.
`(car fold-range)` is not a sequence, it's a symbol, at least in my
C++ buffers. This patch converts it using `symbol-name`.
For simplicity I just converted the `patterns` vector to a list, but
there are probably simpler ways to do this.
It seems `treesit-buffer-root-node` returns an error when there is no
parser for a given buffer. In my case, this happens when trying to get
doc for an elisp variable; `ts-fold--tree-sitter-trigger` runs in an
elisp-mode "*temp*" buffer, and `treesit-buffer-root-node` raises an
error.
If I just ignore the error around that call,
`ts-fold--tree-sitter-trigger` still runs `(ts-fold-mode -1)`. Seems
like the simplest thing is to just do nothing in that case.

To repro the problem, enable ts-fold mode globally and do `C-H v
emacs-version`.
This is not perfect; when folding a node with children, it shows the
first character of the node, then the fold indicator, then the colon.
Ideally it would show the whole name of the key, and then the fold
indicator. Not sure how to do that.
- Use a better ts-fold-range function for yaml
- Remove "block_sequence" fold-parser; don't think that's useful.
- Fix ts-fold--get-nth-child: `(treesit-node-children) is a list, not
  an array.
Some c-preproc-if have no 'else' branch and 'alternative' node.
For example:

CODE:
  #if THREAD_SIZE >= PAGE_SIZE
  void __init __weak thread_stack_cache_init(void)
  {
  }
  #endif

TREE:
(preproc_if #if
 condition: (binary_expression left: (identifier) operator: >= right: (identifier))
 \n
 (declaration type: (primitive_type) declarator: (identifier) type: ;)
 (function_definition type: (type_identifier)
  declarator:
   (function_declarator declarator: (identifier)
    parameters:
     (parameter_list (
      (parameter_declaration type: (primitive_type))
      )))
  body: (compound_statement { }))
 #endif)

So we use last child node(i.e. "#endif") instead of 'alternative' node in this case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants